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(54) Title: CONSERVED AND SPECIFIC STREPTOCOCCAL GENOMES 

(57) Abstract: The invention relates to polynucleotides which are conserved or specific to one or more species of Streptococcus, 
Streptococcus species serotypes, and/or serotype isolates. In particular, the invention relates to polynucleotides from Streptococcus 
which are conserved or specific to one or more of the species of S. pneumoniae ("pneumococcus" or "S. pn."), S. pyogenes ("group 
A streptococcus" or "GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The invention further relates to polynucleotides 
which are conserved or specific to one or more Streptococcal species serotypes, such as GBS serotypes la, lb, II, III, IV, V, VI, VII, 
and VIII. The invention still further relates to polynucleotides which are conserved or specific to one or more clinical isolates of a 
Streptococcus species. 
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CONSERVED AND SPECIFIC STREPTOCOCCAL GENOMES 

CROSS REFERENCE TO RELATED APPLICATIONS 

This application claims priority of U.S. provisional patent application Serial No. 
60/406,237, filed August 26, 2002, U.S. provisional patent application Serial No. 60/406,676, 
filed August 27, 2002 and U.S. provisional patent application Serial No. 60/406,757, filed 
August 28, 2002. 

FIELD OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. The 
conserved or specific genomic regions can be used to identify, screen and develop vaccines and 
other treatments for Streptococcal infections and can be used in diagnostic assays to diagnose 
and identify Streptococcal infections. 

BACKGROUND OF THE INVENTION 

The genus Streptococcus consists of Gram-positive, chain- forming, spherical bacterial 
cells. Three species of clinical interest are S.pneumoniae ("pneumococcus" or "S.pn."), 
S.pyogenes ('group A streptococcus' or 'GAS') and S.agalactiae ('group B streptococcus' or 
6 GBS'). Infections with these three pathogenic streptococci lead to conditions including 
pharyngitis, toxic shock syndrome and necrotizing fasciitis. 

Once thought to infect only cows, GBS is now known to cause serious disease, 
bacteraemia and meningitis in immunocompromised individuals and neonates. There are two 
known types of neonatal infection. The first (early onset, usually within 5 days of birth) is 
manifested by bacteraemia and infection. It is generally contracted vertically as a baby passes 
through the birth canal. GBS is thought to colonize the vagina of about 25% of young women; 
approximately 1% of infants born via a vaginal birth to colonised mothers will become infected. 
Mortality resulting from these infections is between 50 - 70%. The second type of neonatal 
infection is a meningitis that occurs 10 to 60 days after birth. If pregnant women are vaccinated 
with type III capsule so that the infants are passively immunised, the incidence of the late onset 
meningitis is generally reduced, although not entirely eliminated. 
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The "B" in "GBS" refers to the Lancefield classification, which is based on the 
antigenicity of a carbohydrate which is soluble in dilute acid and called the C carbohydrate. 
Lancefield identified 13 types of C carbohydrate, designated A to O, that could be serologically 
differentiated. The organisms that most commonly infect humans are found in groups A, B, D, 

5 and G. Within group B, strains can be divided into at least 9 serotypes (la, lb, II, III, IV, V, VI, 
VII, and VIII) based on the structure of their polysaccharide capsule. Further categories based 
on, for example, the expression of certain proteins have also been developed. 

GBS strains of polysaccharide capsule Type V were rarely isolated before the mid-1980's 
but now account for approximately one-third of clinical isolates in the US. Type V is the most 

10 common capsular serotype associated with invasive infection in nonpregnant adults, and the 
emergence of Type V strain over the past decade has been temporarily linked to an increase in 

GBS disease in this population. 

Group A streptococcus is a frequent human pathogen, estimated to be present in between 
5 - 15% of normal individuals without signs of disease. When host defences are compromised, 
15 or when the organism is able to exert its virulence, or when it is introduced into vulnerable 

r 

tissues or hosts, however, an acute infection occurs. Diseases include puerperal fever, scarlet 
fever, erysipelas, pharyngitis, impetigo, necrotising fasciitis, myositis and streptococcal toxic 
shock syndrome. 

Pneumococcus is the most common cause of acute respiratory infection and otitis media 
20 and is estimated to result in over 3 million deaths in children every year worldwide from 

pneumonia, bacteremia, or meningitis. Even more deaths occur among elderly people, among 
whom S. pn. is the leading cause of community-acquired pneumonia and meningitis. Since 
1990, the number of penicillin-resistant strains has increased from 1 to 5% to 25 to 80% of 
isolates, and many strains are now resistant to commonly prescribed antibiotics such as 
25 penicillin, macrolides, and fluoroquinolones. See Tettelin, et al. (2001) Science 293, 248-506. 

The complete genomic sequence of a virulent isolate of S. pneumoniae was published by 
Tettelin, et al. (2001) Science 293, 248-506 and is available at the TIGR website at 
http://www.tigr.org . as well as on GEN BANK (available through the Pub Med website at 
http://www.ncbi.nlm.nih.gov/entrez/querv.fcgi\ The genomic sequence, the Tettelin article and 
30 its published supplemental material are incorporated herein by reference in their entirety. 

The complete genomic sequence of an Ml strain of S. pyrogenes was published by 
Ferretti, et al. (2001) Proa Natl Acad, Set USA 98, 4658 - 4663 and is available at the TIGR 
website at http://www.tigr.org. The genomic sequence, the Ferretti article and its published 
supplemental materials are incorporated herein by reference in their entirety. 
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The complete genomic sequence of a serotype V strain of S. agalactiae (type V strain 
2603 V/R) was published on August 28, 2002 at Gen Bank Accession no. AE009948 (available 
through Pub Med at http://wAvw.ncbi.nlm.nih. gov/enfrez/querv.fcgi and/or was available on the 
same day at the TIGR website at http://www.tigr.org. Most of this sequence is also availabe in 
5 PCT International Patent Application Publication WO 02/34771 . The genomic sequence, the 
Tettelin article and its published supplemental materials are incorporated herein by reference in 
their entirety. 

Current treatments for Streptococcal infections include both antibiotics and prophylactic 
vaccination. Current vaccines, particularly with respect to GBS, suffer from poor 

10 immunogenicity, while the emergence of antibiotic resistant strains has lessened the 

effectiveness of currently used antibiotics. Accordingly, there is an increasing need for the 
development of new vaccines and antibiotics (as well as other small molecule bacterial 
inhibitors) to help prevent and treat Streptococcal infections. 

Applicants have identified regions of the Streptococcal genomes which can be used to 

15 identify and develop new vaccines and treatments for Streptococcal infections. Specifically, 

Applicants have identified polynucleotides of the Streptococcal genome which are conserved or 
specific to Streptococcal species, species serotypes, and/or specific serotype isolates. These 
polynucleotides and their expressed polypeptides can be used to screen, develop and design new 
vaccines, antibiotics and other small molecule bacterial inhibitors. These polynucleotides and 

20 their expressed polypeptides can further be used to diagnose and identify Steptococcal infections. 

SUMMARY OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. In particular, 

25 the invention relates to polynucleotides from Streptococcus which are conserved or specific to 
one or more of the species of S. pneumoniae ("pneumococeus" or "S. pn."), S. pyogenes ("group 
A streptococcus" or "GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The 
invention further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, II, III, IV, V, VI, VII, and VIII. 

30 The invention still further relates to polynucleotides which are conserved or specific to one or 
more clinical isolates of a Streptococcus species. 

The invention is based on the identification of the following Subsets of genes. Genes 
falling within each subset are described with respect to referenced tables, lists, and/or figures (in 
particular the CGH map depicted in Figure 1). 
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The following Subsets relate to the GBS genome: 

GBS Subset 1: 1060 GBS genes which have homologs with GAS and with 
pneumococcus (Table 8); 

GBS Subset 2: 225 GBS genes which have homologues with GAS, but not with 

5 pneumococcus (Table 10); 

GBS Subset 3: 176 GBS genes which have homologues with pneumococcus but not 

with GAS (Table 9); 

GBS Subset 4: 683 GBS genes which do not have homologues with GAS or 
pneumococcus (specific to GBS vs GAS and pneumococcus) (Table 11). 
10 The invention is based on the identification of the following subsets of genes within the 

GAS genome: 

GAS Subset 1: 1006 GAS genes which have homologues with GBS and with 
pneumococcus (Table 33); 

GAS Subset 2:212 GAS genes which have homologues with GBS but do not have 
15 homologues with pneumococcus (Table 34); 

GAS Subset 3: 62 GAS genes which have homologues with pneumococcus but do not 
have homologues with GBS (Table 35); 

GAS Subset 4: 416 GAS genes which do not have homologues with either GBS or 
pneumococcus. This Subset can be determined by subtracting the above subsets from the 
20 published genome. 

The invention is based on the identification of the following subsets of genes within the 
pneumococcus genome: 

Spn Subset 1: 1034 Spn genes which have homologues with GBS and GAS (Table 36); 

Spn Subset 2: 195 Spn genes which have homologues with GBS but do not have 
25 homologues with GAS (Table 37); 

Spn Subset 3: 74 Spn genes which have homologues with GAS but do not have 
homologues with GBS (Table 38); 

Spn Subset 4: 836 Spn genes which do not have homologues with either GBS or 
pneumococcus. This Subset can be determined by substracting the above Subsets from the 
30 published genome. 

The invention further provides polynucleotides which are conserved or specific to 
Streptococcus based on a comparison with a wide range of published bacterial genomes. The 
following additional Subsets are provided: 
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GBS Subset 1(a): Of the 1060 GBS genes which have homologues in both GAS and 
pneumococcus, 12 of those GBS genes do not have homologues with any of the other published 
bacterial genomes at the time of the invention (i.e., GBS Subset 1(a) is specific to Streptococcus 
vs non Streptococcus published genomes). (The 12 GBS ORFs are listed in Table 3). 
5 GBS Subset 2(a): This Subset comprises GBS genes which have homologues with 

GAS, but not with pneumococcus or any other published bacterial genomes at the time of the 
invention. 

GBS Subset 3(a): This Subset comprises GBS genes which have homologues with 
pneumococcus, but not with GAS or any other published bacterial genomes at the time of the 
10 invention. 

GBS Subset 4(a): Of the 683 GBS genes which do not have homologues in either GAS 
or pnuemococcus, 3 15 of these GBS genes also do not have homologues with any of the other 
published bacterial genomes. These include six proteins predicted to be anchored on the cell 
wall (SAG0677, SAG0771, SAG1052, SAG1331, SAG1473, and SAG1168), three of the 

15 capsule-related genes (SAG1 163, SAG1 167, and SAG1 168), six transcriptional regulators, and 
four genes of the cyl operon (SAG0663 - SAG0673) essential for GBS hemolytic activity and 
production of pigment. See Pritzlaff et al. (2001) Mol Microbiol, 39, 236 - 247. The rest of the 
315 proteins include 240 hypothetical proteins with no similarity to other proteins in databases. 

Many of the 315 genes specific to S. agalactiae are located in regions likely to constitute 

20 mobile genetic elements. Two of these regions resemble prophages (SAG0545-SAG0610 and 
SAG1835-SAG1885) displaying a mosaic structure with segments most similar to different 
bacteriophages, a pattern that suggests frequent recombination events. PblA and PblB are 
adhesins from a S. mitis prophage where they contribute to endocarditis by binding to human 
platelets (See Bensing, et al. (2001) Infect Immun. 69, 6186 - 6192; Bensing, et al (2001) Infect. 

25 Immun. 69, 1373 - 1380. Their orthologs in S. agalactiae are located on separate prophages and 
display a different protein structure. Another region (SAG1247-SAG1299) encodes a putative 
conjugative transposon that carries genes for cadmium efflux and mercury resistance. 

GAS Subset 1(a): This Subset comprises GAS genes which have homologues with GBS 
and with pneumococcus, but do not have homologues with any of the other published bacterial 

30 genomes at the time of the invention. 

GAS Subset 2(a): This Subset comprises GAS genes which have homologues with GBS 
but do not have homologues with pneumococcus or any of the other published bacterial genomes 
at the time of the invention; 
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GAS Subset 3(a): This Subset comprises GAS genes which have homologues with 
pneumococcus but do not have homologues with GBS or any of the other published bacterial 
genomes at the time of the invention. 

GAS Subset 4(a): This Subset comprises GAS genes which do not have homologues 
5 with either GBS or pneumococcus or with any of the other published bacterial genomes at the 
time of the invention. 

Spn Subset 1(a): This Subset comprises Spn genes which have homologues with GBS 
and GAS but which do not have homologues with any of the other published bacterial genomes 
at the time of the invention; 
10 Spn Subset 2(a): This Subset comprises Spn genes which have homologues with GBS 

but do not have homologues with GAS or with any of the other published bacterial genomes at 
the time of the invention; 

Spn Subset 3(a): This Subset comprises Spn genes which have homologues with GAS 
but do not have homologues with GBS or with any of the other published bacterial genomes at 
1 5 the time of the invention; 

Spn Subset 4(a): This Subset comprises Spn genes which do not have homologues with 
either GBS or pneumococcus or with any of the other published bacterial genomes at the time of 
the invention. 

The invention also provides polynucleotides which are conserved or specific to GBS 
20 serotypes and/or clinical isolates. Applicants have sequenced 19 GBS genes from a variety of 

GBS serotypes in 1 1 different clinical isolates. The sequences of these genes and their 

alignments are set forth in Tables 13 — 31. Polynucleotide and polypeptide sequences which are 

specific or conserved across one or more clinical isolates can be identified using these 

alignments. The following additional subsets are provided: 
25 GBS Subset 1(b): of the 1060 GBS genes which have homologues with GAS and with 

pneumococcus, 47 of these GBS genes vary among the 11 clinical isolates (GBS Subset l(b)(i)). 

1013 of these GBS genes are conserved across the 11 clinical isolates (GBS Subset l(b)(ii)). 

These lists can be determined by comparing the genes listed in Table 8 with the Comparative 

Genome Hybridization in Figure 1 . 
30 GBS Subset 2(b): of the 225 GBS genes which have homologues with GAS, but not 

pneumococcus, 44 of these GBS genes vary among the 1 1 clinical isolates (GBS Subset 2(b)(i)). 

181 of these GBS genes are conserved across the 1 1 clinical isolates (GBS Subset 2(b)(ii)). 

These lists can be determined by comparing the genes listed in Table 1 0 with the Comparative 

Genome Hybridization in Figure 1 . 
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GBS Subset 3(b): of the 176 GBS genes which have homologues with pneumococcus, 
44 of these GBS genes vary among 1 1 clinical isolates (GBS Subset 3(b)(i)). 132 of these GBS 
genes are conserved across the 1 1 clinical isolates (GBS Subset 3(b)(ii)). This list can be 
determined by comparing the genes listed in Table 9 with the Comparative Genome 

5 Hybridization in Figure 1 . 

GBS Subset 4(b): of the 683 GBS genes which do not have homologues with GAS or 
pneumococcus, 260 GBS genes vary among the 1 1 clinical isolates (GBS Subset 4(b)(i)). 423 
of these GBS genes are conserved across the 1 1 clinical isolates (GBS Subset 4(b)(ii)). This list 
can be determined by comparing the genes listed in Table 1 1 with the Comparative Genome 

10 Hybridization in Figure 1 . GBS Subset 4(b)(ii) also includes the GBS ORF's listed on Table 12 
receiving a under the column "GBS specific". 

An additional 63 GBS genes have been sequenced and compared in 2 - 1 1 clinical 
isolates. These sequences and their alignments are provided in Tables 40 - 89. Polynucleotide 
and polypeptide sequences which are specific or conserved across one or more clinical isolates 

1 5 can be identified using these alignments. 

The invention further provides polynucleotides which are likely recent genomic 
duplications in GBS. These duplications include glycosyl transferases, sortases, proteins 
anchored on the cell wall, fi lactam resistance factors, and many hypothetic proteins. The GBS 
genes are listed in Table 4 (GBS Subset 5). 

20 The invention is also based on the identification of a cluster of 13 adjacent genes 

(SAG1410 - SAG1424) which is believed to encode enzymes required for synthesis of the group 
B carbohydrate, a coplex multiantennary structure of rhamnose, glucitol phosphate, N- 
acetylglucosamine, and galactose. (GBS Subset 6). Predicted proteins encoded within this 
cluster include seven putative glycoslytransferases, four of which are similar to 

25 rhamnosyltransferases in other streptococcal species; a putative dTDP-L-rhamnose synthase; and 
proteins involved in glucitol synthesis. All nine regonized GBS capsular polysaccharide types 
contain sialic acid residues as part of their repeating unit structure, a feature that contributes to 
virulence by inhibitng activation of the alternative complement pathway. See Edwards et al. 
(1982) J. Immunol 128, 1278 - 1283. 

30 The type V capsular polysaccharide gene cluster consists of 18 genes. (GBS Subset 

6(a)). A region of glycosyltransferases and related proteins (SAG1 1 62 - S AG1 1 70) that direct 
the synthesis of the type V polysaccharide repeat unit is flanked on either side by genes that are 
conserved in all known GBS capsule serotypes. Downstream of this region are genes that 
encode enzynmes for the biosynthesis and activation of sialic acid (SAG1 158 - SAG1 161). 
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Upstream of the serotype specific region are genes (SAG1 171 — SAG1 175) found not only in all 
nine GBS capsular serotypes but also in a variety of other polysaccharide-producing 
streptococci. 

The invention is also based on the identification of GBS ORFs predicted to encode 
5 proteins carrying a signal peptide (GBS Subset 7). These GBS ORF's are listed in Table 2 
receiving a under the column "signal peptide". 

The invention is also based on the identification of GBS ORFs predicted to encode 
proteins which are anchored on the cell wall through an LPxTG motif (GBS Subset 8). These 
GBS ORF's are listed in Table 2 receiving a under the column "sortase motif. 
10 The invention is also based on the identification of GBS ORFs prediced to encode 

lipoproteins (GBS Subset 9). These GBS ORF's are listed in Table 2 receiving a under the 
column "lipoprotein". 

The invention is also based on the identification of two GBS ORF's predicted to encode 
enzymes related to metabolism (GBS Subset 10). These GBS ORFs include a putative 
15 pullulanase (SAG1216) and a neuraminidase-related protein (SAG1932). 

The invention is also based on the identification of GBS ORF's predicted to encode 
proteins exposed on the cell surface (GBS Subset 11). These GBS ORF's are listed in Table 2 
receiving a "+" under the column "FACS". 

The invention is also based on the identification of 401 GBS ORF's from GBS strain 
20 2603 V/R which were not detected in at least one other of the 1 1 tested clinical isolates (GBS 
Subset 12). See Comparative Hybridization Genome in Figure 1. 364 of these 401 ORF's 
correspond to 15 regions containing more than 5 contiguous genes. Each region is identified in 
Figure 1 by numerical yellow bullets. Each region comprises a subset as defined below: 

Region 1: GBS Subset 12(a). This region is unique to GBS (SAG021 8 - SAG023 8). 
25 This region is a possible plasmid or remnant of a phage and contains mostly hypothetical 
proteins. 

Region 2: GBS Subset 12(b) 

Region 3: GBS Subset 12(c) 

Region 4: GBS Subset 12(d) 
30 Region5: GBS Subset 12(e) 

Region 6: GBS Subset 12(f) 

Region 7: GBS Subset 12(g) 
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Region 8: GBS Subset 12(h). This region is specific to GBS (SAG1018 - SAG1037). 
This regioncomprises 20 proteins of unknown function, most of which are predicted to be 
membrane associated or secreted, and displays an atypical nucleotide composition. 

Region 9: GBS Subset 12(i) 

Region 10: GBS Subset 12(j) 

Region 11: GBS Subset 12(k) 

Region 12: GBS Subset 12(1) 

Region 13: GBS Subset 12(m) 

Region 14: GBS Subset 12(n). This region is unique to GBS and spans 33 genes 
(SAG1989 - 2021), including 25 proteins of unknown function, some of which carry a cell-wall 
anchor. 

Region 15: GBS Subset 12(o). 

This invention is also based on identification of clusters of GBS genes as set forth in 
Figure 5 and Table 6. In Figure 5, the presence of a particular gene or gene cluster is indicated in 
the figure by a red square and the absence of a gene or cluster by a black square. The 
relationship between strains based on this analysis is depicted by the tree at the top of the figure. 
The strains and their serotypes are indicated (NT: nontypeable). Clusters with identical profiles 
are reduced to a single horizontal line and the number of genes in each cluster is indicated on the 
right. The clusters of 5 or more genes, labeled in red text and numbered, are listed in Table 6. 
The 1 698 genes shared by all 1 9 strains are labeled in green text. Applicants identified the 
following subsets: 

GBS Subset 13 (a): Cluster 1 (from Table 6). 

GBS Subset 13 (b): Cluster 2 (from Table 6). 

GBS Subset 13 (c): Cluster 3 (from Table 6). 

GBS Subset 13 (d): Cluster 4 (from Table 6). 

GBS Subset 13 (e): Cluster 5 (from Table 6). 

GBS Subset 13 (f): Cluster 6 (from Table 6). 

GBS Subset 13 (g): Cluster 7 (from Table 6). 

GBS Subset 13 (h): Cluster 8 (from Table 6). 

GBS Subset 13 (i): Cluster 9 (from Table 6). 

GBS Subset 13 (j): Cluster 10 (from Table 6). 

GBS Subset 13 (k): Cluster 11 (from Table 6). 

GBS Subset 13 (1): Cluster 12 (from Table 6). 

GBS Subset 13 (m): Cluster 13 (from Table 6). 
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GBS Subset 13 (n): Cluster 14 (from Table 6). 
GBS Subset 13 (o): Cluster 15 (from Table 6). 
GBS Subset 13 (p): Cluster 16 (from Table 6). 
GBS Subset 13 (q): 1698 ORFs shared by all strains. 
5 The invention is also based on the identification of the polynucleotide sequences of 82 

genes from up to 11 different GBS strains. 19 of these genes are listed on Table 7. A further 
GBS Subset 14 includes this set of polynucleotide sequences from the 1 1 strains and their 
encoded polypeptide sequences. In particular, GBS Subset 14 contains a Subset of 
polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 
10 between two or more strains (GBS Subset 14(a)). GBS Subset 14 further includes a Subset of 
polynucleotide fragments of 1 5 or more contiguous polynucleotides which are conserved 
between two or more strains (GBS Subset 14(b)). GBS Subset 14 further includes a Subset of 
polynucleotide fragments of 1 0 or more contiguous polynucleotides which are conserved 
between three or more strains (GBS Subset 14(c)). GBS Subset 14 further includes a Subset of 
1 5 polynucleotide fragments of 1 0 or more contiguous polynucleotides which are conserved 
between four or more strains (GBS Subset 14(d)). 

GBS Subset 14 further includes a Subset of polypeptide fragments of 5 or more 
contiguous amino acids which are conserved between in two or more strains (GBS Subset 
14(e)). GBS Subset 14 further includes a Subset of polypeptide fragments of 5 or more 
20 contigous amino acids which are conserved between three or more strains (GBS Subset 14(f)). 
GBS Subset 14 further includes a Subset of polypeptide fragments of 5 or more contiguous 
amino acids which are conserved between four or more strains (GBS Subset 14(g)). GBS 
Subset 14 further includes a Subset of polypeptide fragments of 10 or more contiguous amino 
acids which are conserved across two or more strains (GBS Subset 14(h)). 
25 The invention provides for methods of screening a Streptococcal genome for a conserved 

or a specific genomic sequence using one or more of the Subsets of the invention. 

The invention further provides for an immunogenic composition comprising a 
polypeptide expressed by one or more of the polynucleotides in one or more of the Subsets of the 
invention, and methods for designing an immunogenic composition by selecting one or more 
30 polypeptides expressed by one or more of the polynucleotides in one or more of the Subsets of 
the invention. Preferably, the immunogenic compositions of the invention comprise at least two, 
three, four or five polypeptides encoded by polynucleotides within the same Subset. 
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The invention further provides for methods of screening compounds for activity against a 
Streptococcal bacteria, which method comprises contacting the compounds with a polypeptide 
expressed by the polynucleotide from one of the Subsets of the invention. 

The invention further provides for compositions comprising one or more of the 
polynucleotides, and fragments thereof, selected from the group consisting of the sequences set 
forth in Tables 13 - 3 1 or 40 - 89. 

The invention further provides for compositions comprising polypeptides and fragments 
thereof encoded by the polynucleotides set forth in Tables 13 — 31 or 40 -89. 

The invention provides for compositions comprising polypeptides and fragments thereof 
set forth in Tables 1 3 - 3 1 or 40 -89. 

BRIEF DESCRIPTION OF THE TABLES AND DRAWINGS 

Table 1 comprises a complete list of GBS predicted genes, listed by SAGxxxx ORF 
number. The SAGxxxx ORF number corresponds to the genomic sequence for the 
Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by August 
28, 2002 at http://www.tigr.orR or at the GenBank database at accession number AE009948. 
This table also includes the predicted amino acid size of the predicted expressed protein and the 
predicted function, if known. 

Table 2 comprises a list of predicted and experimentally characterized surface and 
secreted proteins from GBS. The SAGxxxx ORF number corresponds to the genomic sequence 
for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by 
August 28, 2002 at http://www.tigr.org or at the GenBank database at accession number 
AE009948. 

Table 3 lists GBS genes which were shared among GBS, GAS and pneumococcus, but 
which were not found in any of the other completely sequenced genomes. The SAGxxxx ORF 
number corresponds to the genomic sequence for the Streptococcus agalactiae type V strain 2603 
V/R available either at the TIGR website by August 28, 2002 at http://www.tigr.org or at the 
GenBank database at accession number AE009948. 

Table 4 depicts GBS genes which are predicted to have been recently duplicated within 
the genome. The SAGxxxx ORF number corresponds to the genomic sequence for the 
Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by August 
28, 2002 at http ://www.ti isx.org or at the GenBank database at accession number AE009948. 

Table 5 lists the 19 GBS strains used for comparative genome hybridisations and 
phylogenetic analysis. 
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Table 6 lists clusters of GBS genes derived from phylogenetic profiling of GBS strains 
based on comparative genome hybridisations. The SAGxxxx ORF number corresponds to the 
genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www.tigr.org or at the GenBank database at 
accession number AE009948. 

Table 7 lists the GBS genes used for phylogenetic analyses of the 19 GBS strains. The 
SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus agalactiae 
type V strain 2603 V/R available either at the TIGR website by August 28, 2002 
http://www.tigr.org or at the GenBank database at accession number AE009948. 

Table 8 lists the 1060 GBS ORF's which are shared with GAS and pneumococcus. The 
ORFxxxxx reference number can be translated to SAGxxxx ORF number by using Table 32. 
The SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus 
agalactiae type V strain 2603 V/R available either at the TIGR website by August 28, 2002 at 
http://www.tigr.org or at the GenBank database at accession number AE009948. 1 

Table 9 lists the 176 GBS ORF's which are shared with pneumococcus but which are not 
homologous to a GAS gene. The ORFxxxxx reference number can be translated to SAGxxxx 
ORF number by using Table 32. The SAGxxxx ORF number corresponds to the genomic 
sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR 
website by August 28, 2002 at http://www.tigr.org or at the GenBank database at accession 
number AE009948. 

Table 10 lists the 225 GBS ORF's which are shared with GAS but which are not 
homologous with a pnuemococcus gene. The ORFxxxxx reference number can be translated to 
SAGxxxx ORF number by using Table 32. The SAGxxxx ORF number corresponds to the 
genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www.tigr.org or at the GenBank database at 
accession number AE009948. 

Table 1 1 lists 683 GBS ORF's which are not shared with either GAS or pneumococcus. 
The ORFxxxxx reference number can be translated to SAGxxxx ORF number by using Table 32. 
The SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus 
agalactiae type V strain 2603 V/R available either at the TIGR website by August 28, 2002 at 
http://www.tigr.org or at the GenBank database at accession number AE009948. 

Table 12 lists 315 GBS ORF's which are not shared with GAS, pneumococcus or any 
other published genomic sequence. The ORFxxxxx reference number can be translated to 
SAGxxxx ORF number by using Table 32. The SAGxxxx ORF number corresponds to the 
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genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www.tigr.org or at the GenBank database at 
accession number AE009948. 

Table 13 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
5 SAG0466. An alignment of each of the sequences is also included. 

Table 14 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0471. An alignment of each of the sequences is also included. 

Table 15 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0492. An alignment of each of the sequences is also included. 
10 Table 16 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 

SAG0767. An alignment of each of the sequences is also included. 

Table 17 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1086. An alignment of each of the sequences is also included. 

Table 18 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
15 SAG1600. An alignment of each of the sequences is also included. 

Table 19 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1680. An alignment of each of the sequences is also included. 

Table 20 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1723. An alignment of each of the sequences is also included. 
20 Table 21 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 

GBS ORF SAG0079. An alignment of each of the sequences is also included. 

Table 22 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0093. An alignment of each of the sequences is also included. 

Table 23 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
25 GBS ORF SAGO 163. An alignment of each of the sequences is also included. 

Table 24 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0290. An alignment of each of the sequences is also included. 

Table 25 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0368. An alignment of each of the sequences is also included. 
30 Table 26 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 

GBS ORF SAG0503. An alignment of each of the sequences is also included. 

Table 27 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1473. An alignment of each of the sequences is also included. 
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Table 28 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1552. An alignment of each of the sequences is also included. 

Table 29 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG 1641. An alignment of each of the sequences is also included. 

Table 30 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG2147. An alignment of each of the sequences is also included. 

Table 3 1 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG2148. An alignment of each of the sequences is also included. 

Table 32 provides a conversion table for the ORFxxxx reference numbers to the 
SAGxxxx reference numbers. The SAGxxxx ORF number corresponds to the genomic sequence 
for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by 
August 28, 2002 at http://www.tigr.org or at the GenBank database at accession number 
AE009948. 

Table 33 lists the 1006 GAS ORF's which are shared with GBS and Spn. The sequences 
corresponding to these ORFs were published in GenBank, Accession No. AAK33146 (protein 
sequence). A link to the corresponding polynucleotide sequence is also available. The numbers 
for the GAS ORF refer directly to their GenBank entries. 

Table 34 lists the 212 GAS ORF's which are shared with GBS but which do not have 
homologues with pneumococcus. The sequences corresponding to these ORFs were published in 
GenBank, Accession No. AAK33 146 (protein sequence). A link to the corresponding 
polynucleotide sequence is also available. The numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 35 lists the 62 GAS ORF's which have homologues with pneumococcus but which 
do not have homologues with GBS. The sequences corresponding to these ORFs were published 
in GenBank, Accession No. AAK33146 (protein sequence). A link to the corresponding 
polynucleotide sequence is also available. The numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 36 lists the 1034 Spn ORF's which are shared with GBS and GAS. These ORF's 
were published in GenBank. The numbers for Spn correspond to the entry for AE005672. 

Table 37 lists the 195 Spn ORF's which are shared with GBS but do not have 
homologues with GAS. These ORF's were published in GenBank. The numbers for Spn 
correspond to the entry for AE005672. 
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Table 38 lists the 74 Spn ORF's which are shared with GAS but do not have homologues 
withGBS. These ORF's were published in GenBank. The numbers for Spn correspond to the 
entry for AE005672. 

Table 40 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
5 ORF SAG0635. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 41 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0649. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 42 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0764. An alignment of the polynucleotide and polypeptide sequences is also included. 
10 Table 43 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0079. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 44 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0416. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 45 lists the polynucleotide and polypeptide sequences of 5 strains relating to GBS 
15 ORF SAG1404. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 46 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1615. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 47 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0739. An alignment of the polynucleotide and polypeptide sequences is also included. 
20 Table 48 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG1474. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 49 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG 15 02. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 50 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
25 ORF SAG1024. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 5 1 lists the polynucleotide and polypeptide sequences of 7 strains relating to GBS 
ORF SAG0677. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 52 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1823. An alignment of the polynucleotide and polypeptide sequences is also included. 
30 Table 53 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0755. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 54 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0949. An alignment of the polynucleotide and polypeptide sequences is also included. 
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Table 55 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG 1592. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 56 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0806. An alignment of the polynucleotide and polypeptide sequences is also included. 
5 Table 57 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 

ORF SAG1488. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 58 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 
ORF SAGO 182. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 59 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
10 ORF SAG2147. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 60 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG 1945. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 61 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG 1030. An alignment of the polynucleotide and polypeptide sequences is also included. 
15 Table 62 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0690. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 63 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1912. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 64 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
20 ORF SAG0827. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 65 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF S AG023 1 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 66 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0754. An alignment of the polynucleotide and polypeptide sequences is also included. 
25 Table 67 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0475. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 68 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0499. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 69 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
30 ORF SAG0032. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 70 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG1280. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 71 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1333. An alignment of the polynucleotide and polypeptide sequences is also included. 
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Table 72 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0941 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 73 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0981 . An alignment of the polynucleotide and polypeptide sequences is also included. 
5 Table 74 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG1572. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 75 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0671 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 76 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 
10 ORF SAG0260. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 77 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 
ORF SAG2059. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 78 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1016. An alignment of the polynucleotide and polypeptide sequences is also included. 
15 Table 79 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG2150. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 80 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG1266. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 81 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
20 ORF SAGOO 1 1 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 82 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 
ORF SAGO 165. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 83 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGO 108. An alignment of the polynucleotide and polypeptide sequences is also included. 
25 Table 84 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0267. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 85 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1361. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 86 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
30 ORF SAG1393. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 87 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0645. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 88 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0477. An alignment of the polynucleotide and polypeptide sequences is also included. 
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Table 89 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1350. An alignment of the polynucleotide and polypeptide sequences is also included. 

Figure 1 is a circular representation of the GBS genome and comparative hybridisations 
using microarrays. A color version of Figure 1 can be found in Tettelin et al., PNAS (2002) 
5 99(19): 12391 - 12396 and online at www.pnas.org. 

Figure 2 is a schematic representation of in silico comparisons between streptococci. A 
color version of Figure 2 can be found in Tettelin et al., PNAS (2002) 99(19): 12391 - 12396 
and online at www.pnas.org. 

Figure 3 depicts a phylo genetic tree of GBS strains based on PGR sequences. 
10 Figure 4 depicts a linear representation of the GBS genome. A color version of Figure 4 

can be found in the supporting information to Tettelin et al., PNAS (2002) 99(19): 12391 - 
12396 available online at www.pnas.org. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on comparative 
genome hybridisations. A color version of Figure 5 can be found in the supporting information 
15 to Tettelin et al., PNAS (2002) 99(19): 12391 - 12396 available online at www.pnas.org. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. In particular, 
20 the invention relates to polynucleotides from Streptococcus which are conserved or specific to 
one or more of the species of S. pneumoniae ("pneumococcus" or "S. pn."), S. pyogenes ("group 
A streptococcus" or "GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The 
invention further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, II, III, IV, V, VI, VII, and VIII. 
25 The invention still further relates to polynucleotides which are conserved or specific to one or 
more clinical isolates of a Streptococcus species. 

In order to facilitate an understanding of the invention, selected terms used in the 
application will be discussed below. 

As used herein, the phrase " species of Streptococcus " generally refers to species of the 
30 Streptoccus family, including S.pneumoniae ("pneumococcus" or "S.pn."), S.pyogenes ('group A 
streptococcus' or 'GAS') and S.agalactiae ('group B streptococcus' or 'GBS'). 

As used herein, the phrase " Streptococcus species serotypes " generally refers to 
subdivisions based on a distinguishing characteristic within a specific Streptococcus species. 
The distinguishing characteristic can be identified by any of a wide range of diagnostic tools. 
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For instance, GBS is generally recognized as comprising at least nine subdividing serotypes 
based on the structure of their polysaccharide capsule. 

As used herein, the phrases " serotype isolates " or " clinical isolates " generally refer to 
specific isolated bacterial strains of a specific Streptococcal species and serotype. 

As used herein in reference to bacterial genomes, the phrases " conserved " or " shared " 
generally refer to genomic sequences which have homologues in the two or more genomes in the 
reference. Homology references, as used in this application, are generally based on comparisons 
using FASTA3. See Pearson (2000)Methods Mol Biol. 132 185-219. When the homology 
reference involves a comparison between genes in GBS, GAS or Spn, homologous or shared 
genes are typically defined by using a FAST A3 P value cutoff of 1CT 15 . Where the homology 
reference involves a comparison between GBS, GAS or Spn and all other completely sequenced 
genomes, homologous or shared genes are typically defined by using a FASTA3 P value cutoff 
of 1 (T 5 or lower. 

As used herein in reference to bacterial genomes, the phrases "specific to" or "not shared" 
generally refer to genomic sequences which do not have homologues in the two or more 
genomes in the reference. 

Other software programs to compare identity and to determine homology between 
nucleotide sequences are known in the art, for example those described in section 7.7.18 of 
Current Protocols in Molecular Biology (F.M. Ausubel et al, eds., 1987) Supplement 30. A 
preferred alignment program is GCG Gap (Genetics Computer Group, Wisconsin, Suite Version 
10.1), preferably using default parameters, which are as follows: open gap = 3; extend gap ~ 1. 

Sequences within a Subset of the invention include sequences which hybridize to the 
listed genes. Hybridization reactions can be performed under conditions of different 
"stringency". Conditions that increase stringency of a hybridization reaction of widely known 
and published in the art [e.g. page 7.52 of Sambrook et al. (1989) Molecular Cloning: A 
Laboratory Manual NY, Cold Spring Harbor Laboratory]. Examples of relevant conditions 
include (in order of increasing stringency): incubation temperatures of 25°C, 37°C, 50°C, 55°C 
and 68°C; buffer concentrations of 10 x SSC, 6 x SSC, 1 x SSC, 0.1 x SSC (where SSC is 
0.15 M NaCl and 15 mM citrate buffer) and their equivalents using other buffer systems; 
formamide concentrations of 0%, 25%, 50%, and 75%; incubation times from 5 minutes to 24 
hours; 1, 2, or more washing steps; wash incubation times of 1, 2, or 15 minutes; and wash 
solutions of 6 x SSC, 1 x SSC, 0.1 x SSC, or de-ionized water. Hybridization techniques and 
their optimization are well known in the art [e.g. see Sambrook et al; RNA Methodologies 
(Farrell, 1998) (Academic Press; ISBN 0-12-249695-7); Current Protocols in Molecular Biology 
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(F.M. Ausubel et al, eds., 1987) Supplement 30; Short protocols in molecular biology (4th 
edition, 1999) Ausubel et al eds. ISBN 0-471-32938-X; US patent 5,707,829 eta]. 

Identity between polypeptide sequences can be determined using software programs 
known in the art, for example those described in section 7.7.18 of Current Protocols in 
5 Molecular Biology (F.M. Ausubel et al., eds., 1987) Supplement 30. A preferred alignment is 
determined by the Smith- Waterman homology search algorithm [Smith & Waterman (1981) Adv. 
AppL Math 2: 482-489.] using an affine gap search with a gap open penalty of 12 and a gap 
extension penalty of 2, BLOSUM matrix 62. 

Typically, 50% identity or more between two proteins may be considered to be an 
10 indication of functional equivalence. References to a percentage sequence identity between two 

amino acid sequences means that, when aligned, that percentage of amino acids are the same in 

i 

comparing the two sequences. 

The terms " polypeptide ", " protein " and " amino acid sequence " as used herein generally 
refer to a polymer of amino acid residues and are not limited to a minimum length of the product. 

15 Thus, peptides, oligopeptides, dimers, mulimers, and the like, are included within the definition. 
Both full-length proteins and fragments thereof are encompassed by the definition. Minimum 
fragments of polypeptides useful in the invention can be at least 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 
14, 15, 18, 20, 25, 30, 35, 40 or 50 amino acids. Typically, polypeptides useful in this invention 
can have a maximum length suitable for the intended application. Generally, the maximum 

20 length is not critical and can easily be selected by one skilled in the art. 

Reference to polypeptides and the like also includes derivatives of the amino acid 
sequences of the invention. Such derivatives can include postexpression modifications of the 
polypeptide, for example, glycosylation, acetylation, phosphorylation, and the like. Amino acid 
derivatives can also include modifications to the native sequence, such as deletions, additions 

25 and substitutions (generally conservative in nature), so long as the protein maintains the desired 
activity. These modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the proteins or errors due to PGR 
amplification. Furthermore, modifications may be made that have one or more of the following 
effects: reducing toxicity; facilitating cell processing (e.g., secretion, antigen presentation, etc.); 

30 and facilitating presentation to B-cells and/or T-cells. 

A " recombinant " protein is a protein which has been prepared by recombinant DNA 
techniques as described herein. In general, the gene of interest is cloned and then expressed in 
transformed organisms, as described further below. The host organism expressed the foreign 
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gene to produce the protein under expression conditions. The polypeptides of the invention may 
be prepared by recombinant means. 

The term " polynucleotide " . as known in the art, generally refers to a nucleic acid 
molecule. A "polynucleotide" can include both double- and single-stranded sequences and refers 
5 to, but is not limited to, cDNA from viral, prokaryotic or eukaryotic MRNA, genomic RNA and 
DNA sequences from viral (e.g. RNA and DNA viruses and retroviruses) or prokaryotic DNA, 
and especially synthetic DNA sequences. The term also captures sequences that include any of 
the known base analogs of DNA and RNA, and includes modifications such as deletions, 
additions and substitutions (generally conservative in nature), to the native sequence, so long as 
10 the nucleic acid molecule encodes a therapeutic or antigenic protein. These modifications may 
be deliberate, as through site-directed mutagenesis, or may be accidental, such as through 
mutations of hosts that produce the antigens. Modifications of polynucleotides may have any 
number of effects including, for example, facilitating expression of the polypeptide product in a 
host cell. 

15 The term "polynucleotide" further includes DNA, RNA, DNA/RNA hybrids, DNA and 

RNA analogues such as those containing modified backbones (with modifications in the sugar 
and/or phosphates e.g. phosphorothioates, phosphoramidites etc.), and also peptide nucleic acids 
(PNA) and any other polymer comprising purine and pyrimidine bases or other natural, 
chemically or biochemically modified, non-natural, or derivatized nucleotide bases etc. Nucleic 

20 acid according to the invention can be prepared in many ways (e.g. by chemical synthesis, from 
genomic or cDNA libraries, from the organism itself etc.) and can take various forms (e.g. single 
stranded, double stranded, vectors, probes etc.). 

A polynucleotide can encode a biologically active (e.g., immunogenic or therapeutic) 
protein or polypeptide. Depending on the nature of the polypeptide encoded by the 

25 polynucleotide, a polynucleotide can include as little as 10 nucleotides, e.g., where the 

polynucleotide encodes an antigen. The polynucleotides of the invention may comprise at least 
10, 13, 15, 18, 20, 22, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 80, 90 or 100 consecutive 
polynucleotides . 

By " isolated " is meant, when referring to a polynucleotide or a polypeptide, that the 
30 indicated molecule is separate and discrete from the whole organism with which the molecule is 
found in nature or, when the polynucleotide or polypeptide is not found in nature, is sufficiently 
free of other biological macromolecules so that the polynucleotide or polypeptide can be used for 
its intended purpose. 
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" Antibody" as known in the art includes one or more biological moieties that, through 
chemical or physical means, can bind to or associate with an epitope of a polypeptide of interest. 
The antibodies of the invention specifically bind to infectious prion conformations. The term 
"antibody" includes antibodies obtained from both polyclonal and monoclonal preparations, as 

5 well as the following: hybrid (chimeric) antibody molecules (see, for example, Winter et ah 
(1991) Nature 349: 293-299; and U.S. Patent No. 4,816,567; F(ab')2 and F(ab) fragments; F v 
molecules (non-covalent heterodimers, see, for example, Inbar et al. (1972) Proc Natl Acad Sci 
USA 69:2659-2662; and Ehrlich et al. (1980) Biochem 19:4091-4096); single-chain Fv molecules 
(sFv) (see, for example, Huston et al. (1988) Proc Natl Acad Sci USA 85:5897-5883); dimeric 

10 and trimeric antibody fragment constructs; minibodies (see, e.g., Pack et al. (1992) Biochem 
31:1579-1584; Cumber et al. (1992) J Immunology 149B : 120-126); humanized antibody 
molecules (see, for example, Riechmann et al. (1988) Nature 332:323-327 r ; Verhoeyan et al. 
(1988) Science 239:1534-1536; and U.K. Patent Publication No. GB 2,276,169, published 21 
September 1994); and, any functional fragments obtained from such molecules, wherein such 

15 fragments retain immunological binding properties of the parent antibody molecule. The term 
"antibody" further includes antibodies obtained through non-conventional processes, such as 
phage display. 

As used herein, the term "monoclonal antibody " refers to an antibody composition 
having a homogeneous antibody population. The term is not limited regarding the species or 

20 source of the antibody, nor is it intended to be limited by the manner in which it is made. Thus, 
the term encompasses antibodies obtained from murine hybridomas, as well as human 
monoclonal antibodies obtained using human rather than murine hybridomas. See, e.g., Cote, et 
al. Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, p 77. 

An " immunogenic composition " as used herein refers to a composition that comprises an 

25 antigenic molecule where administration of the composition to a subject results in the 

development in the subject of a humoral and/or a cellular immune response to the antigenic 
molecule of interest. The immunogenicity of the composition or the antigenicity of the molecule 
may be facilitated by the use of an adjuvant. 

The practice of the present invention will employ, unless otherwise indicated, 

30 conventional methods of chemistry, biochemistry, molecular biology, immunology and 

pharmacology, within the skill of the art. Such techniques are explained fully in the literature. 
See, e.g., Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pennsylvania: Mack 
Publishing Company, 1990); Methods In Enzymology (S. Colowick and N. Kaplan, eds., 
Academic Press, Inc.); and Handbook of Experimental Immunology, Vols. I-IV (D.M. Weir and 
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C.C. Blackwell, eds., 1986, Blackwell Scientific Publications); Sambrook, et al., Molecular 
Cloning: A Laboratory Manual (2nd Edition, 1989); Handbook of Surface and Colloidal 
Chemistry (Birdi, K.S. ed., CRC Press, 1997); Short Protocols in Molecular Biology, 4th ed. 
(Ausubel et aL eds., 1999, John Wiley & Sons); Molecular Biology Techniques: An Intensive 
Laboratory Course, (Ream et aL, eds., 1998, Academic Press); PCR (Introduction to 
Biotechniques Series), 2nd ed. (Newton & Graham eds., 1997, Springer Verlag); Peters and 
Dalrymple, Fields Virology (2d ed), Fields et al. (eds.), B.N. Raven Press, New York, NY. 

It is understood that the antibodies and methods of this invention are not limited to 
particular formulations or process parameters as such may, of course, vary. It is also to be 
understood that the terminology used herein is for the purpose of describing particular 
embodiments of the invention only, and is not intended to be limiting. 

All publications, patents and patent applications cited herein are hereby incorporated by 
reference in their entirety. 

Vaccines and Immunisation 

The invention provides an immunogenic composition comprising a polypeptide, or a 
fragment thereof, which is encoded by a polynucleotide sequence which is conserved across one 
or more species of Streptococcus. 

The polynucleotide is preferably conserved across one or more species of Streptococcus 
selected from the group consisting of GBS, GAS and pneumococcus. In one embodiment, the 
polynucleotide is a GBS polynucleotide which is homologous with at least one gene from both 
GAS and pneumococcus. Preferably, the GBS polynucleotide is selected from GBS Subset 1, 
which includes 1060 GBS genes which have homologues with both GAS and pneumococcus 
(Table 8). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from both GBS and pneumococcus. Preferably, the GAS 
polynucleotide is selected from GAS Subset 1, which includes 1006 GAS genes which have 
homologues with both GBS and pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcal polynucleotide which is 
homologous with at least one gene both GAS and GBS. Preferably, the pneumococcus 
polynucleotide is selected from Spn Subset 1, which includes 1034 pneumococcal genes which 
have homologous with both GBS and GAS. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is selected from 



23 



WO 2004/018646 



PCT/US2003/026827 



one of the genes listed GBS Subset 2, which includes 225 GBS genes which have homologues 
with GAS, but not with pneumococcus. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus. Preferably, the polynucleotide is 
5 selected from GBS Subset 3, which includes 176 GBS genes which have homologues with 
pneumococcus. 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from GBS. Preferably, the polynucleotide is selected from 
GAS Subset 2, which includes 212 GAS genes which have a homologue with GBS. 
10 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous with at least one gene from pneumoccus. Preferably, the polynucleotide is selected 
from GAS Subset 3, which includes 62 GAS genes which have a homologue with 
pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 

15 homologous with at least one gene from GBS. Preferably, the polynucleotide is selected from 
Spn Subset 2, which includes 195 Spn genes which have a homologue with GBS. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is selected from 
Spn Subset 3, which includes 74 Spn genes which have a homologue with GAS. 

20 The invention further provides an immunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more species of Streptococcus. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide which is specific to GBS, GAS and 

25 pneumococcus. In one embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pneumococcus. Preferably, the GBS 
polynucleotide is selected from GBS Subset 1. In an alternative embodiment, the polynucleotide 
is a GBS polynucleotide which is homologous to at least one gene from both GAS and 
pneumococcus, but which is not homologous to a gene in any other published bacterial genome 

30 at the time of the invention. Preferably, the GBS polynucleotide is selected from one of the 12 
GBS genes included in GBS Subset 1(a). (Table 3). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous to at least one gene in both GBS and pneumococcus. Preferably, the GAS 
polynucleotide is selected from GAS Subset 1 . In another embodiment, the polynucleotide is a 
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GAS polynucleotide which is homologous to at least one gene in both GBS and pneumococcus 
but which is not homologous to any gene in any other published bacterial genome at the time of 
the invention. Preferably, the GAS polynucleotide is selected from GAS Subset 1(a). 

Alternatively, the polynucleotide is a pneumoccus polynucleotide which is homologous 
5 to at least one gene in both GBS and GAS. Preferably, the pneumococcus polynucleotide is 
selected from Spn Subset 1(a). In another embodiment, the polynucleotide is a pneumoccus 
polynucleotide which is homologous to at least one gene in both GBS and GAS but which does 
not have a homologue in any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 1(a). 

10 The invention further provides an immunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS. 
In one embodiment, the polynucleotide is a GBS polynucleotide which is not homologue to a 
gene in either GAS or pneumococcus. Preferably, the GBS polynucleotide is selected from one 
of the 683 GBS genes included in GBS Subset 4. In a further embodiment, the polynucleotide is 

15 a GBS polynucleotide which is not homologous to a gene in either GAS or pneumococcus or any 
other published bacterial genome at the time of the invention. Preferably, the GBS 
polynucleotide is selected from one of the 315 GBS genes in GBS Subset 4(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GAS. 

20 In one embodiment, the polynucleotide is a GAS polynucleotide which is not homologous to a 
gene in either GBS or pneumococcus. Preferably, the GBS polynucleotide is selected from one 
of the 416 GAS genes included in GAS Subset 4. In a further embodiment, the polynucleotide is 
a GAS polynucleotide which does not have a homologue in either GBS or pneumococcus or in 
any other published bacterial genome at the time of the invention. Preferably, the GAS 

25 polynucleotide is selected from GAS Subset 4(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to 
pneumococcus. In one embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is not homologous to a gene in either GBS or GAS. Preferably, the pneumococcus 

30 polynucleotide is selected from one of the 836 Spn genes included in Spn Subset 4. In a further 
embodiment, the polynucleotide is a pneumococcus polynucleotide which does not have a 
homologue in either GBS or GAS or in any other published bacterial genome at the time of the 
invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 4(a). 
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The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS 
and GAS. In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous 
to at least one gene from GAS but is not homologous to a gene from pneumococcus. Preferably, 
5 the GBS polynucleotide is selected from one of the 225 GBS genes included in GBS Subset 2. 
In another embodiment, the GBS polynucleotide is homologous to at least one gene from GAS 
but is not homologous to any gene from pneumococcus and does not have a homologue in any 
other published bacterial genome at the time of the invention. Preferably, the GBS 
polynucleotide is selected from GBS Subset 2(a). 

10 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous to at least one gene from GBS but is not homologous to any gene from 
pneumococcus. Preferably, the GAS polynucleotide is selected from one of the 212 GAS genes 
included in GAS Subset 2. In another embodiment, the GAS polynucleotide is homologous to at 
least one gene from GBS but is not homologous to any gene from pneumococcus and does not 

15 have a homologous gene with any other published bacterial genome at the time of the invention. 
Preferably, the GAS polynucleotide is a selected from GAS Subset 2(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS 
and pneumococcus. In one embodiment, the polynucleotide is a GBS polynucleotide which is 

20 homologous to at least one gene from pneumococcus but is not homologous to any gene from 
GAS. Preferably, the GBS polynucleotide is selected from one of the 176 GBS genes included 
in GBS Subset 3. In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus but is not homologous with any GAS 
polynucleotide and does not have a homologous gene in any of the other published bacterial 

25 genomes at the time of the invention. Preferably, the GBS polynucleotide is selected from GBS 
Subset 3(a). 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one gene from GBS, but is not homologous with any gene from GAS. 
Preferably, the pneumoccous polynucleotide is selected from one of the 195 Spn genes included 
30 in Spn Subset 2. In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GBS, but is not homologous with any gene 
from GAS and does not have a homologous gene in any other published bacterial genome at the 
time of the invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 
3(a). 
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The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof which is encoded by a polynucleotide sequence which is specific to GAS 
and pneumococcus. In one embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from pneumococcus but is not homologous with any gene 
from GBS. Preferably, the GAS polynucleotide is selected from one of the 62 GAS genes 
included in GAS Subset 3. In another embodiment, the polynucleotide is a GAS polynucleotide 
which is homologous with at least one gene from pneumococcus but is not homologous with any 
gene from GBS and is not homologous with any gene of any published bacterial genome at the 
time of the invention. Preferably, the GAS polynucleotide is selected from GAS Subset 3(a). 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one GAS polynucleotide, but is not homologous with any GBS gene. 
Preferably, the pneumoccous polynucleotide is selected from one of the 74 Spn genes included in 
Spn Subset 3. In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GAS, but is not homologous with any gene 
from GBS or with a gene from any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 3(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more Streptococcal species serotypes. Preferably, the polynucleotide is specific to a 
Streptococcal species serotype selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferably, the polynucleotide is specific to one or more GBS serotypes 
selected from the group consisting of GBS serotype la, lb, II, III, IV, V, VI, VII and VIII. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is conserved across 
one or more Streptococcal species serotypes. Preferably, the polynucleotide is specific to a 
Streptococcal species serotype selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferable, the polynucleotide is conserved across one or more GBS 
serotypes selected from the group consisting of GBS serotype la, lb, II, III, IV, V, VI, VII and 
VIII. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more clinical isolates of a Streptococcal species. Preferably, the polynucleotide is specific to a 
Streptococcal species clinical isolate selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferably, the polynucleotide is specific to one or more GBS clinical 
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isolates selected from the clinical isolates identified in Table 5. Still more preferably, the 
polynucleotide is specific to one or more GBS clinical isolates having one or more genes 
selected from the genes listed in Table 7. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
5 homologous to at least one gene from both GAS and pneumococcus and which varies among 
clinical isolates. In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pneumococcus and which is homologous 
with at least one gene from at least one of the clinical isolates identified in Table 5. In another 
embodiment, the polynucleotide is a GBS polynucleotide which is homologous to at least one 

10 gene from both GAS and pneumococcus and which is homologous with at least one gene from 
each of the clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from 
one of the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous to 
at least one gene from GAS and is not homologous to any gene from pneumococcus and which 

15 varies among clinical isolates. In another embodiment, the polynucleotide is a GBS 

polynucleotide which is homologous to at least one gene from GAS and is not homologous to 
any gene from pneumococcus and which is homologous to at least one gene from at least one of 
the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from GAS and is not homologous to 

20 any gene from pneumococcus and which is homologous to at least one gene from each of the 
clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from one of the 
genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous to 
at least one gene from pneumococcus and is not homologous to any gene from GAS and which 

25 varies among clinical isolates. In another embodiment, the polynucleotide is a GBS 

polynucleotide which is homologous to at least one gene from pneumococcus and is not 
homologous to any gene from GAS and which is homologous to at least one gene from at least 
one of the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a 
GBS polynucleotide which is homologous to at least one gene from pneumococcus and is not 

30 homologous to any gene from GAS and which is homologous to at least one gene from each of 
the clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from one of 
the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is not 
homologous to any gene from GAS or pneumococcus and which varies among clinical isolates. 
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In another embodiment, the polynucleotide is a GBS polynucleotide which is not homologous to 
any gene from GAS or pneumococcus and which is homologous to at least one gene from at least 
one of the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a 
GBS polynucleotide which is not homologous to any gene from GAS or pneumococcus and 
5 which is homologous to at least one gene from each of the clinical isolates identified in Table 5. 
Preferably, the polynucleotide is selected from one of the genes listed in Table 7. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is conserved across 
one or more clinical isolates of a Streptococcal species. Preferably, the polynucleotide is 

10 conserved across one or more Streptococcal clinical isolates selected from the Streptococcal 

species GBS, GAS and pneumococcus. More preferable, the polynucleotide is conserved across 
one or more GBS clinical isolates identified in Table 5. Still more preferably, the polynucleotide 
is conserved across one or more clinical isolates having one or more genes selected from the 
genes listed in Table 7. 

15 The invention further provides for an immunogenic composition comprising a 

polypeptide, or a fragment thereof, encoded by a polynucleotide selected from one or more of the 
Subsets of the invention. Accordingly, the invention provides for an immunogenic composition 
comprising a polypeptide encoded by a polynucleotide selected from one or more of the 
following Subsets: GBS Subset 1, GBS Subset 2, GBS Subset 3, GBS Subset 4, GAS Subset 1, 

20 GAS Subset 2, GAS Subset 3, GAS Subset 4, Spn Subset 1, Spn Subset 2, Spn Subset 3, Spn 
Subset 4, GBS Subset 1(a), GBS Subset 2(a), GBS Subset 3(a), GBS Subset 4(a), GAS Subset 
1(a), GAS Subset 2(a), GAS Subset 3(a), GAS Subset 4(a), Spn Subset 1(a), Spn Subset 2(a), 
Spn Subset 3(a), Spn Subset 4(a), GBS Subset 1(b), GBS Subset 2(b), GBS Subset 3(b), GBS 
Subset 4(b), GBS Subset 5, GBS Subset 6, GBS Subset 6(a), GBS Subset 7, GBS Subset 8, GBS 

25 Subset 9, GBS Subset 10, GBS Subset 11, GBS Subset 12, GBS Subset 12(a), GBS Subset 

12(b), GBS Subset 12(c), GBS Subset 12(d), GBS Subset 12(e), GBS Subset 12(f), GBS Subset 
12(g), GBS Subset 12(h), GBS Subset 12(i), GBS Subset 12(j), GBS Subset 12(k), GBS Subset 
12(1), GBS Subset 12(m), GBS Subset 12(n), GBS Subset 12(o), GBS Subset 13(a), GBS Subset 
13(b), GBS Subset 13(c), GBS Subset 13(d), GBS Subset 13(e), GBS Subset 13(f), GBS Subset 

30 13(g), GBS Subset 13(h), GBS Subset 13(i), GBS Subset 13(j), GBS Subset 13(k), GBS Subset 
13(1), GBS Subset 13(m), GBS Subset 13(n), GBS Subset 13(o), GBS Subset 13(p), GBS Subset 
13(q), GBS Subset 14, GBS Subset 14(a), GBS Subset 14(b), GBS Subset 14(c), GBS Subset 
14(d), GBS Subset 14(e), GBS Subset 14(f), GBS Subset 14(g), and GBS Subset 14(h). 
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The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1, GBS Subset 2, GBS Subset 3, and GBS Subset 4. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
5 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GAS Subset 1, GAS Subset 2, GAS Subset 3 5 and GAS Subset 4. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: Spn Subset 1, Spn Subset 2, Spn Subset 3, and Spn Subset 4. 
10 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1(a), GBS Subset 2(a), GBS Subset 3(a), and GBS Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
15 Subsets: GAS Subset 1(a), GAS Subset 2(a), GAS Subset 3(a), and GAS Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: Spn Subset 1(a), Spn Subset 2(a), Spn Subset 3(a), and Spn Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
20 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1(b), GBS Subset 2(b), GBS Subset 3(b), and GBS Subset 4(b). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from GBS Subset 5. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
25 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 6 and GBS Subset 6(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 7. 

30 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 8. 
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The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 9. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
5 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 10. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 11. 

10 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 12, GBS Subset 12(a), GBS Subset 12(b), GBS Subset 12(c), GBS Subset 
12(d), GBS Subset 12(e), GBS Subset 12(f), GBS Subset 12(g), GBS Subset 12(h), GBS Subset 
12(i), GBS Subset 12(j), GBS Subset 12(k), GBS Subset 12(1), GBS Subset 12(m), GBS Subset 

15 12(n), and GBS Subset 12(o). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 13(a), GBS Subset 13(b), GBS Subset 13(c), GBS Subset 13(d), GBS 
Subset 13(e), GBS Subset 13(f), GBS Subset 13(g), GBS Subset 13(h), GBS Subset 13(i), GBS 

20 Subset 13(j), GBS Subset 13(k), GBS Subset 13(1), GBS Subset 13(m), GBS Subset 13(n), GBS 
Subset 13(o), GBS Subset 13(p), GBS Subset 13(q). 

The invention provides for an immunogenic composition comprising a polypeptide or a 
fragment thereof encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 14, GBS Subset 14(a), GBS Subset 14(b), GBS Subset 14(c), GBS Subset 

25 14(d), GBS Subset 14(e), GBS Subset 14(f), GBS Subset 14(g), and GBS Subset 14(h). 

Each of the above-identified groups and subsets may be used to create immunogenic 
compositions comprising two or more Streptococcus polypeptides. The invention then provides 
for an immunogenic composition comprising a combination of Streptococcus polypeptides, said 
combination consisting of two, three, four, five, six, seven, eight, nine, or ten polypeptides 

30 selected from one of the groups identified above. Preferably, the combination consists of two, 
three, four or five polypeptides. Preferably, the polypeptides are all selected from the same 
group. Preferably, the polypeptides are selected from the same Subset described herein. The 
Streptococcus polypeptides are selected from GBS, GAS and pneumococcus. Preferably, all of 
the polypeptides in the combination are selected from the same species. 
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For example, the composition may comprise an combination of GBS polypeptides, said 
combination consisting of two, three, four, five, six, seven, eight, nine, or ten polypeptides, 
wherein each polypeptide is encoded by a GBS polynucleotide sequence which is homologous to 
a polynucleotide sequence of both GAS and pneumococcus. Preferably, the combination 
5 consists of two, three, four or five polypeptides. Preferably, the GBS polynucleotide sequences 
are selected from GBS Subset 1. 

As another example, the composition may comprise a combination of GBS polypeptides, 
said combination consisting of two, three, four or five polypeptides, wherein each polypeptide is 
encoded by a GBS polynucleotide sequence which is homologous to a polynucleotide sequence 
10 of GAS. Preferably, the GBS polynucleotide sequences are selected from GBS Subset 2. 

The composition may comprise a combination of GBS polypeptides, said combination 
consisting of two, three, four or five polypeptides, wherein each polypeptide is encoded by a 
GBS polynucleotide sequence which is homologous to a polynucleotide sequence of 
Streptococcus pneumoniae. Preferably, the GBS polynucleotide sequences selected from GBS 
1 5 Subset 3 . 

The composition may comprise a combination of GBS polypeptides, said combination 
consisting of two, three, four or five polypeptides, wherein each polypeptide is encoded by a 
GBS serotype polynucleotide sequence which is homologous to at least one other GBS serotype. 
Preferably, the GBS polypeptides are encoded by GBS serotype polynucleotide sequences which 
20 are homologous to at least one other GBS serotype. 

The invention further provides for an immunogenic composition comprising a 
polypeptide or a fragment thereof comprising a fusion protein encoded by one or more of the 
polynucleotides included in the Subsets of the invention. 

The invention further provides a method for designing an immunogenic composition, 
25 such as a vaccine, by selecting one or more polypeptides encoded by a polynucleotide selected 
from one or more of the Subsets of the invention. Preferably, the immunogenic compositions of 
the invention comprise at least two, three, four or five polypeptides encoded by polynucleotides 
within the same Subset. 

The invention provides a method for raising an immune response in a patient by 
30 administering any one of the immunogenic compositions set forth above. The choice of 

immunogenic composition means that the immune response may be reactive against all three of 
GAS, GBS and streptococcus, may be reactive against only two of the three, or may be reactive 
only against GBS. 
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Each of the immunogenic compositions described above may be prepared and 
administered instead as a polynucleotide where the polypeptide is expressed in vivo. 

The immune response is preferably an antibody response. It may be a protective immune 
response. The patient is preferably a human. 

The immunogenic compositions of the invention may further comprise an adjuvant, as 
discussed in further detail below. 

Essential genes and knockouts 

The invention provides a Streptococcus bacterium wherein one or more genes within any 
of the Subsets of this invention have been knocked out. The choice of Subset means that the 
knocked out gene may be, for instance, a gene found in GBS but not in GAS or pneumococcus 
(e.g. which is involved in the pathogenesis of GBS, but not in the pathogenesis of GAS or 
pneumococcus, such as binding GBS cellular targets). 

Techniques for producing knockout bacteria are well known, and knockout Streptococci 
of various species have been reported [e.g. Margolis et al. (2001) Antimicrob. Agents Chemother. 
45:2432-2435; Zhang et al (2000) Cell 102:827-837; Nizet et al. (2000) Infect. Immun. 68:4245- 
4254; Nizet et al (1997) Adv. Exp. Med. Biol 418:627-630; etc.]. 

The knockout mutation may be situated in the coding region of the gene or may lie within 
its transcriptional control regions (e.g. within its promoter). 

The knockout mutation will reduce the level of mRNA encoding the corresponding 
polypeptide to <1% of that produced by the wild-type bacterium, preferably <0.5%, more 
preferably <0.1%, and most preferably to 0%. 

The knockout mutants of the invention may be used as immunogenic compositions (e.g. 
as vaccines) to prevent streptococcal infection. Such a vaccine may include the mutant as a live 
attenuated bacterium. 

The knockout mutants of the invention may be used to determine whether genes are 
essential for bacterial survival, either under normal or stress conditions. 

Antisense 

The invention provides a single-stranded nucleic acid comprising a fragment of xi or 
more nucleotides from a nucleotide sequence selected from one of the Subsets of the invention. 
The choice of group means that the nucleic acid may be complementary to a gene sequence 
found in GBS, GAS and pneumococcus, or a gene sequence specific to GBS. 
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The single-stranded nucleic acid is at least xi nucleotides long. The value of xi is at least 
7 (e.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45^ 46, 47, 48, 49, 50 etc.). The single-stranded 
nucleic acid may be at most x 2 nucleotides long, wherein x 2 is 100 or less (e.g. 99, 98, 97, 96, 95, 
5 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 
68, 67, 66, 65, 64, 63, 62, 61, 60). 

The nucleic acid is preferably of the formula y-(N) a -Qty(N) b -3\ wherein 0>a>15, 
0>6>1 5, N is any nucleotide, and X is the fragment as defined above. The values of a and b may 
independently be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. Each individual nucleotide N 
10 in the ~(N) a - and -(N)*- portions of the nucleic acid may be the same or different. The length of 
the nucleic acid (i.e. a+Z?+x;) is preferably^ or less. 

Antisense inhibition of streptococcal gene expression is known e.g. Sato et al. (1998) 
FEMS Microbiol Lett 159:241-245. Antibacterial antisense techniques are also disclosed in 
international patent applications WO99/02673 and W099/13893. 
15 The single-stranded nucleic acid may reduce the level of polypeptide expression from the 

complementary gene to <1% of that produced by the wild-type bacterium, preferably <0.5%, 
more preferably <0.1%, and most preferably to 0%. 

Antisense experiments may be used to determine whether genes are essential for bacterial 
survival, either under normal or stress conditions. 

20 

Screening methods 

The invention provides a method for screening compounds, wherein the method involves 
contacting the compounds with a polypeptide expressed by one or more of the polynucleotides 
selected from one of the Subsets of the invention. The method maybe for screening for agonists 

25 of the polypeptides, antagonists, antibiotics etc. The choice of group means, for instance, that 
the method may be used for identifying an antibiotic with broad anti-streptococcal activity could 
be identified, or for identifying an antibiotic specific to GBS. 

Potential compounds for screening include small organic molecules, peptides, peptoids, 
polypeptides, lipids, metals, nucleotides, nucleosides, aptamers, polyamines, antibodies, and 

30 derivatives thereof. Small organic molecules have a molecular weight between 50 and about 
2,500 daltons, and most preferably in the range 200-800 daltons. Complex mixtures of 
substances, such as extracts containing natural products, compound libraries or the products of 
mixed combinatorial syntheses also contain potential antagonists. 
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Typically, a polypeptide is incubated with a test compound, and the mixture is then tested 
to see if the polypeptide and test compound interact, or to see if the polypeptide's activity is 
inhibited. 

For preferred high-throughput screening methods, all the biochemical steps for this assay 
are performed in a single solution in, for instance, a test tube or microtitre plate, and the test 
compounds are analysed initially at a single compound concentration. For the purposes of high 
throughput screening, the experimental conditions are adjusted to achieve a proportion of test 
compounds identified as "positive" compounds from amongst the total compounds screened. 

The invention also provides a compound identified using these methods. These can be 
used to treat or prevent streptococcal infection. The compound preferably has an affinity for the 
adhesion-specific protein of at least 10" 7 M e.g. 10" 8 M, 10" 9 M, 10" 10 M or tighter. 

Distinguishing Streptococcal species 

The invention provides a method for determining whether a Streptococcus bacterium of 
interest is or is not in the species agalactiae, pyogenes or pneumoiae, comprising the step(s) of: 
(a) contacting the bacterium with a nucleic acid probe comprising the sequence of a gene 
selected from one of the Subsets of the invention; and/or (b) contacting the bacterium with an 
antibody which binds to a polypeptide encoded by one or more of the polynucleotides of one or 
more of the Subsets of the invention. The choice of group means, for instance, that the method 
may be used for distinguishing GBS from GAS and from pneumococcus, or for confirming that a 
bacterium is not a GAS or pneumococcus. 

The method will typically include the ftirther step of detecting the presence or absence of 
an interaction between the bacterium of interest and the nucleic acid or protein. 

The bacterium of interest may be in a cell culture, for example, or may be within a 
biological sample believed or known to contain a streptococcus. It may be intact or may be, for 
instance, lysed. 

The term "biological sample" encompasses a variety of sample types obtained from an 
organism and can be used in a diagnostic or monitoring assay. The term encompasses blood and 
other liquid samples of biological origin, solid tissue samples, such as a biopsy specimen or 
tissue cultures or cells derived therefrom and the progeny thereof. The term encompasses 
samples that have been manipulated in any way after their procurement, such as by treatment 
with reagents, solubilization, or enrichment for certain components. The term encompasses a 
clinical sample, and also includes cells in cell culture, cell supernatants, cell lysates, serum, 
plasma, biological fluids, and tissue samples. 
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GBS 2603 Type V Genomic Sequence 

Applicants have sequenced the complete genome sequence of GBS clinical type V isolate 
2603 V/R and performed comparative analyses comparing this sequence with other GBS strains, 
5 with other species of pathogenic Streptococci and with other known bacterial species. The entire 
genomic sequence is available by August 26, 2002 at http://www.tigr.org . This genomic 
sequence is incorporated herein by reference in its entirety. The genomic sequence of GBS type 
V isolate 2603 V/R is also set forth in International Patent Application WO 02/34771. 

In one embodiment, the invention relates to the polynucleotides, and fragments and 
10 derivatives thereof, set forth in the GBS clinical type V isolate 2603 published genome which are 
not disclosed within WO 02/34771 . The invention further relates to polypeptides expressed by 
the polynucleotides of the invention. 

Applicants have predicted that the GBS 2603 isolate contains approximately 2,176 
predicted genes. Each predicted gene is set forth in Table 1, listed by a SAGxxxx ORF number. 
1 5 Table 1 also includes the predicted amino acid size of the predicted expressed protein and the 

predicted function, if known. The sequence of each SAG reference can be obtained at the TIGR 
website. 

Figure 1 is a circular representation of the GBS genome and comparative hybridisations 
using microarrays. A color version of Figure 1 can be found in Tettelin et aL, PNAS (2002) 

20 99(19): 12391-12396 and online at www.pnas.org . The outer circle represents predicted 

coding regions on the plus strand color coded by role categories: violet indicating amino acid 
biosynthesis; light blue indicating biosynthesis of cofactors, prosthetic groups, and carriers; light 
green indicating cell envelope; red indicating cellular processes; brown indicating central 
intermediary metabolism; yellow indicating DNA metabolism; light gray indicating energy 

25 metabolism; magenta indicating fatty acid and phospholipid metabolism; pink indicating protein 
synthesis and fate; orange indicating purines, pyrimidines, nucleosides, and nucleotides; olive 
indicating regulatory functions and signal transduction; dark green indicating transcription; teal 
indicating transport and binding proteins; gray indicating unknown function; salmon indicating 
other categories; blue indicating hypothetical proteins. 

30 The second circle represents predicted coding regions on the minus strand. In the third 

circle, black represents atypical nucleotide composition cui*ve; green represents most atypical 
regions; magenta represents insertion elements; red diamonds indicate rRNAs. 

Circles 4-22 represent comparative hybridisations of strain 2603 V/R with 19 GBS 
strains. Cy3/Cy5 (2603 V/R signal/test strain) ratio cutoffs were defined arbitrarily as Cy3/Cy5 
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- 1.0 - 3.0, the gene was present in the test strain, no color was added; Cy3/Cy5 = 3.0 - 10.0, 
ambiguous result (blue); Cy3/Cy5 > 10, gene absent in test strain (red). 

Circles 4-9 represent type la strains 090, 515, A909, Davis, and DK8. Circles 10-11 
represent type lb strains S7 7357b and H36B. Circles 12 - 13 represent type II strains 18RS21 

5 and DK21. Circles 14 - 18 represent type III COH1, COH31, D136C, M732 and M781. Circle 
19 represents type V strain CJB1 11. Circles 20 -21 represent type VIII strains SMU014 and 
JM9130013. Circle 22 represents nontypable (NT) strain CJB1 10. Throughout Figure 1, 
varying regions of five or more consecutive genes are indicated by yellow bullets. 

Figure 4 depicts a linear representation of the GBS genome. The location of predicted 

10 coding regions color-coded by biological role (see Figure 1) is displayed. Arrowed boxes 
represent the direction of transcription for each ORF. The number of membrane-spanning 
domains predicted by TopPred is displayed as lipid bi-layers on top of ORFs, only for those 
whose products have five or more predicted membrane spanning regions. Genes coding for 
rRNAs (16S, 23S, 5S) and tRNAs (clover leaf structure with number of genes) are indicated. 

1 5 Predicted Rho-independent transcriptional terminators are represented by hairpins. 

ORF's were predicted by GLIMMER (See, Delcher, et al., (1999) Nucleic Acids Res. 27, 
4636 - 4641 and Salzberg, et al., (1998) Nucleic Acids Res. 26, 544-548) trained with ORFs 
larger than 600 base pairs from the genomic sequence and GBS genes available in GenBank. All 
predicted proteins larger than 30 amino acids were searched against a nonredundant protein 

20 database. (See Fleischmann, et al., (1995) Science 269, 496 - 512). Frame-shifts and point 
mutations were detected and corrected where appropriate; those remaining were annotated as 
"authentic frame-shift" or "authentic point mutation". Protein membrane-spanning domains 
were identified by TOPPRED (See Claros, et al., (1994) Comput. Appl Biosci. 10, 685 - 686). 
Candidate lipoprotein signal peptides (See Hayashi et al., (1990) J. Bioenerg. Biomembr. 22, 451 

25 - 471) were flagged by N-terminal exact matches to the pattern {DERK} (6)-[LIVMFWSTAG] 
(2)-[LIVMFYSTAGCQ] - [AGS] - C. Putative signal peptides were identified by using 
SIGNALP (Nielsen, et al., (1997) Protein Eng. 10, 1 - 6). Two sets of hidden Markov models 
were used to determine ORF membership in families and superfamilies: PFAM Ver. 5.5 
(Bateman, et al., (2000) Nucleic Acids Res. 28, 263 - 266) and TIGRFAMS 1 .0 (Haft et al., 

30 (2001) Nucleic Acids Res. 29, 41 - 43). Domain-based paralogous families were built by 

performing all-versus-all searches on the protein sequences by using a modified version of a 
previously described method. (Niermann, et al., (2001) Proc. Natl Acad. Set USA 98, 4136 - 
4141) Potential lineage-specific gene duplications were estimated by identification of OFRs 
more similar to ORFs within the GBS genome than to ORFs from other complete genomes. All 
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ORFs were searched with FASTA3 (Pearson (2000) Methods Mol Biol 132, 185-219) against 
all ORF's from the complete genomes and matches with a FASTA P value of 10" 15 were 
considered significant. 

The genome consists of a circular chromosome of 2,160,266 base pairs with a G+C 

5 content of 35.7%. Base pair one of the chromosome was assigned within the putative origin of 
replication. The genome contains 80 tRNAs, 7rRNAs, and 3 sRNAs. Approximately 78% of the 
2,176 predicted genes are transcribed in the same direction as that of DNA replication, a feature 
also observed in S. pn. and other low-GC Gram positive organisms. 

Biological roles were assigned to 1,409 (65%) of the genome according to a classification 

10 scheme adapted from Riley (1993) Microbiol Rev. 57, 862 - 952. Another 527 predicted 
proteins (24%) matched proteins of unknown function, and the remaining 240 (11%) had no 
database match. The expression of 50 of these hypothetical proteins was confirmed by Western 
Blot analysis, and the proteins were annotated as "proteins of unknown function." A total of 339 
paralogous protein families were identified in strain 2603, containing 941 predicted proteins 

15 (43% of the total). 

The Western Blot analysis was conducted as follows. GBS strain 2603 V/R cells were 
grown in Todd-Hewitt broth (Difco) to OD600nm = 0.5. The culture was centrifuged for 20 
minutes at 5,000 rpm. The supernatant was discarded, and bacteria were washed once with PBS, 
resuspended in 2 ml of 50 mM Tris-HCl pH 6.8, containing 400 units of Mutanolysin (Sigma), 

20 and incubated 2 hours at 37°C. After three cycles of freeze and thaw, cellular debris was 

removed by centrifugation at 14,000 rpm for 10 minutes, and the protein concentration of the 
supernatant was measured by the Bio-Rad Protein assay, with BSA as a standard. Purified 
recombinant proteins (50 ng) and total cell extracts (25 ng) derived from GBS serotype V 2603 
V/R strain were separated by SDS/PADE and electroblotted onto nitrocellulose membranes for 1 

25 hour at 100 V. The membranes were saturated by overnight incubation at 4° C in 5% skimmed 
milk and 0.1% Tween 20 in PBS and incubated for 1 hour at room temperature with sera from 
immunized mice diluted 1 :500 - 1 : 1,000 in saturation buffer. To reduce background due to 
antibodies raised against contaminating E. coli proteins, sera were preincubated with E. coli 
protein extracts absorbed on nitrocellulose strips. The membranes were washed twice in 3% 

30 skimmed milk and 0. 1 % Tween 20 in PBS and incubated for 1 hour with a 1 : 1 ,000 dilution of 
horseradish peroxidase-conjugated antimouse Ig (DAKO). After washing with 0.1% Tween 20 
in PBS, the membranes were developed with the Opti-4CN Substrate Kit (Bio-Rad). 
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Table 2 comprises a list of predicted and experimentally characterized surface and 
secreted proteins from GBS. Candidate signal peptides and lipoprotein motifs were predicted 
with PSORT [Nakai, K. & Horton, P. (1999) Trends Biochem Sci 24, 34-6] and other methods 
(see methods), sortase motifs (LPxTG) were detected using the FINDPATTERNS program of 
5 the GCG Package [Devereux, J., Haeberli, P. & Smithies, O. (1984) Nucleic Acids Res 12, 387- 
95] and hidden Markov models. Column "Other" indicates proteins carrying other motifs {e.g. 
integrin-binding motif RGD) or are similar to characterized surface-exposed proteins. Western 
blot results were considered positive when the antibodies revealed a predominant band of the 
expected molecular weight on the total protein extracts of S. agalactiae strain 2603 V/R, ORFs 

10 without + or - in this column were not tested in western blot. FACS analyses were performed 
for western blot positive proteins only. Western blot and FACS data are displayed only for 
proteins carrying at least one of the other motifs shown in the table. Column "GBS specific" 
indicates genes unique to S. agalactiae (when compared to other completely sequenced 
genomes) that are present in all the S. agalactiae strains tested in comparative genome 

15 hybridization analyses. Finally, only proteins carrying less than 3 predicted transmembrane 
domains are shown in the table, other proteins are likely to be embedded in the cytoplasmic 
membrane and are probably not exposed on the organism's surface. 

FACS data was collected as follows: GBS 2603 V/R strain cells were grown in Todd- 
Hewitt broth (Difco) to OD600nm = 0.5. The culture was centrifuged for 20 minutes at 5,000 

20 rpm, and bacteria were washed once with PBS, resuspended in PBS containing 0.05% 

paraformaldehyde, and incubated for 1 hour at 37°C and then overnight at 4°C. Fifty microliters 
of fixed bacteria (OD600nm 0.1) was washed once with PBS, resuspended in 20 jal of newborn 
calf serum (Sigma), and incubated for 1 hour at 4°C in IOOjliI of preimmune or immune sera and 
diluted 1:200 in dilution buffer (PBS, 20% newborn calf serum, 0.1% BSA). After 

25 centrifugation and washing with 200j-il of washing buffer (0.1 % BSA in PBS), samples were 
incubated for 1 hour at 4°C with 50 jlxI of R-phycoerythrin-conjugated F(ab)2 goat anti-mouse 
IgG (Jackson ImmunoResearch) diluted 1 :100 in dilution buffer. Cells were washed with 200 pi 
of washing buffer and resuspended in 200 jol of PBS. Samples were analysed by using a FACS 
calibur apparatus (Becton Dickinson), and data were analyzed by using CELL QUEST (Becton 

30 Dickinson). A shift in mean fluorescence intensity of >75 channels compared with preimmune 
sera from the same mice was considered positive. This cutoff was determined from the mean 
plus two standard deviations of shifts obtained with control sera raised against mock purified 
recombinant proteins from cultures of E. coli carrying the empty expression vector and included 
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in every experiment. Artifacts due to bacterial lysis were excluded by using antisera raised 
against six different known cytoplasmic proteins, all of which gave negative results. 

Regions of Atypical Nucleotide Composition. 

These regions were identified by the x 2 analysis: the distribution of all 64 trinucleotides 
(3 mers) was computed for the complete genome in all six reading frames, followed by the 3-mer 
distribution in 2,000-bp windows. Windows overlapped by 1,000 bp. For each window, the x 2 
statistic on the difference between its 3-mer content, and that of the whole genome was 
computed. 

In Silico Genome Comparisons 

The protein sets of S. agalactiae, Streptococcus pneumoniae and S. pyogenes were 

i 

compared by using FASTA3. A general description of the FASTA3 sequence comparison 
program is discussed in Pearson, W.R., "Flexible Sequence Similarity Searching with the 
FASTA3 Program Package", (2000) Methods Mol Biol, 132: 185-219. Shared genes were 
defined using a FASTA3 P value cutoff of 10' 15 . These shared genes and genes that S. agalactiae 
did not share with the other streptococci using this cutoff were subsequently searched against all 
completely sequenced genomes, and genes were defined as unique to streptococci or S. 
agalactiae when they did not share similarity with any other gene sets with a FASTA3 P value of 
10" 5 or lower. The use of two cutoffs provides for a more stringent analysis of shared or unique 
genes. 

Figure 2 is a schematic representation of in silico comparisons between streptococci. The 
protein sets of GBS, S. pn., and GAS were compared by using FASTA3. Numbers under the 
species name indicate genes that are not shared with the other species; values in parenthesis are 
the number of proteins in each species (excluding frame-shifted and degenerated genes). 
Numbers in the intersections indicate genes shared by two or three species. These are displayed 
in the color corresponding to the species used as the query. (GBS: green; S.pn.: blue; GAS: 
red. A color version of Figure 2 can be found in Tettelin et al., PNAS (2002) 99(19): 12391 - 
12396 and online at www.pnas.org .). Numbers in any given intersection are slightly different 
due to gene duplications in some species. 

Table 3 lists genes which were shared among GBS, GAS and pneumococcus, but which 
were not found in any of the other completely sequenced genomes. The protein sets of 
S. agalactiae, S. pneumoniae, and S. pyogenes were compared using FASTA3 [Pearson, W. R. 
(2000) Methods Mol Biol 132, 185-219]. Shared genes were defined using a FASTA3 p value 
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cutoff of 10" 15 . These shared genes and genes that S. agalactiae did not share with the other 
streptococci using this cutoff were subsequently searched against all completely sequenced 
genomes and genes were defined as unique to streptococci or S. agalactiae when they did not 
share similarity with any other gene sets with a FASTA3 p value of 10~ 5 or lower. 

5 

Svnteny 

Regions of conservation of gene synteny were computed as windows of 1 0 kb spanning 
at least three genes whose order was conserved in the other species. Regions were merged if 
they were less than 20 kb apart. The number of genes within each broad region was then 
10 calculated. 



Comparative Genome Hybridizations 

Comparative genome hybridizations (See Figure 1) using DNA microarrays were 
performed between the sequenced type V strain 2603 V/R and 19 other GBS strains of multiple 
15 serotypes (See Table %). Predicted genes from strain 2603 V/R were amplified by PGR and 
arrayed on glass microscope slides. See Peterson, et al., (2000) J. Bacteriol 182, 6192-6202. 
Genomic DNA was labelled according to protocols provided by J. DeRisi 

r www.microarravs.org/Pdfs/Genomic-DNALabel B.pdf) , except that the DNA was not digested 
or sheared before labelling. Arrays were scanned with a GENEPIX 4000B scanner (Axon 

20 Instruments, Foster City, CA), and individual hybridisation signals were quantitated with TIGR 
SPOTFINDER. See Hedge, et al., (2000), Biotechniques 29, 548-550, 552-554, 556. Cy3/Cy5 
(2603 V/R signal/test strain) ratio cutoffs were defined arbitrarily as Cy3/Cy5 = 1.0-3.0, gene 
present in test strain; 3.0 - 10.0, ambiguous result; >10.0, gene absent. For ambiguous results, 
the gene may be divergent in the test strain relative to 2603 V/R, or the gene may be absent in 

25 the test strain but still produces paralogous gene family or a repetitive elemtn. Although cutoffs 
are arbitrary, they fit nicely the results for the variation of the capsule locus in the strains tested 
(see region 9 on Figure 1) where most genes are slightly divergent and only a few are completely 
different. 

The CGH detected 1,698 genes in all of the strains, whereas 401 genes from strain 2603 
30 V/R (18% of the gene complement) were not detected in at least one other strain, suggesting that 
they are absent or significantly divergent in those strains. Two hundred sixty (38%) of the 683 
genes specific to S. agalactiae when compared with the other two streptococci (Fig. 2), including 
virulence determinants and surface proteins, vary among S. agalactiae strains, whereas only 47 
(4%) of the genes common to all three streptococcal species, including 5 of the 6 sortases 
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identified in the genome, vary among strains. Thus, the in silico analysis of genes shared by the 
streptococci that are not expected to vary among this genus is consistent with the CGH analysis. 
Forty-four (25%) of the genes shared by S. agalactiae and S. pneumoniae and 44 (20%) of those 
shared by S. agalactiae and S. pyogenes vary in the CGH analysis. The first set contains many 

5 glycosyl transferases and proteins carrying a cell-wall anchor, whereas the second set displays 
many phage-related genes. One hundred thirty-six of the 3 1 5 genes unique to S. agalactiae 
when compared with all sequenced genomes vary among strains. These include R5, three 
capsular genes, two cell wall-anchored proteins, and three transcriptional regulators. Three 
hundred sixty-four (91%) of the 401 varying genes correspond to 15 regions containing more 

10 than 5 contiguous genes. Ten of these regions display an atypical nucleotide composition in 
strain 2603 V/R (Fig. 1), consistent with the possibility that they were horizontally transferred 
into this strain. Two of the largest regions (region 4, a prophage and region 7, similar to Tn916 
from Enterococcus faecalis) are flanked by insertion sequence elements. The 15 regions contain 
many proteins predicted to be anchored on the cell wall or surface exposed, including Rib 

15 (region 3), sortases, glycosyl transferases, the capsule locus (region 9, divergent in all strains but 
the other type V strain CJB1 1 1), and phage-related genes. Region 14 is unique to S. agalactiae 
and spans 33 genes (SAG1989- SAG2021), including 25 proteins of unknown function, some of 
which carry a cell-wall anchor. It is flanked by an ISL3 transposase and displays an atypical 
nucleotide composition. Region 1 , unique to S. agalactiae, is a possible plasmid or remnant of a 

20 phage (SAG021 8-SAG023 8), contains mostly hypothetical proteins, and is flanked by a site- 
specific recombinase. Region 8 is specific to S. agalactiae, comprises 20 proteins of unknown 
function (SAG1018- SAG1037), most of which are predicted to be membrane associated or 
secreted, and displays an atypical nucleotide composition. 

The CGHresults were analyzed by profile clustering where genes are grouped based on 

25 their distribution patterns (Fig. 5). Sixteen clusters of five or more contiguous and 

noncontiguous genes comprising a total of 300 genes were identified (Table 6). Several clusters 
correspond to regions of contiguous genes described above. Some clusters of genes that do not 
share sequence similarity and are located at different loci in the genome display an identical 
profile. For instance, a cluster of genes containing a surface antigen (SAG0674-SAG0681) 

30 follows the same distribution as another cluster containing only hypothetical proteins (SAG0247- 
SAG0249). A putative pathogenicity protein (SAG2063) also clusters with a region containing 
several glycosyl transferases and Sec proteins (SAG1447-SAG1462). 
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Profile clustering was also used to group strains based on similarity of gene content (Fig. 
5). In addition, the sequences of 19 genes from each of 1 1 S. agalactiae strains were determined 
after PGR amplification and used for phylogenetic analyses. The strains were the following: type 
la, 090 and A909; type lb, H36B; type II, 18RS21; type III, COH1, M732 and M781; type V, 
5 2603 V/R and 1 169NT1 ; type VIII, JM9130013; and nontypeable strain CJB1 10. The set 
comprised 8 housekeeping genes and 1 1 genes coding for proteins predicted to be surface- 
exposed (Table 7). 

The profile clustering was conducted as follows. The information and absence of genes 

based on the comparative genome hybridisation results was used to group genes based on their 
10 distribution patterns. The analysis used was essentially identical to that used for phylogenetic 

profile analysis. See Pellegrinie, et al., (1999) Proa Natl Acad, Set USA 96, 4285 - 4288. 

Each gene was assigned a binary profile based on its presence or absence across the different 

strains, with presence determined by a Cy3/Cy5 ratio < 3.0 and absence > 3.0. The gene profiles 

were then clustered by using the single-linkage clustering algorithm with column weighting (all 
15 with default settings) of CLUSTER ( http://rana.lbl.gov) . The CLUSTER program also groups 

the strains (columns) based on similarity of gene profiles. Clusters of genes and strains were 

viewed by using TREEVIEW ( http://rana.lbl.gov) . 

Phylogenetic trees were inferred for the complete set of 19 genes and for the subsets of 

housekeeping and surface-exposed genes. Because the branching patterns in all three trees were 
20 identical, only the tree of the 19 genes is shown in Fig. 3. The degree of polymorphism of the 

housekeeping and the surface-exposed genes is similar (-1 variable site among all of the strains 

per 100 bp). 

The sequences of genes from the different strains were aligned by using CLUSTALW 
(See Thompson (1994), Nucleic Acids Res, 22, 4673 - 4680.) and trimmed to remove 

25 ambiguously aligned regions. Phylognetic trees of individual genes and of concatenated 

alignments of multiple genes were inferred by using maximum likelihood methods of PAUP* 
4.0 blO (Sinauer, Sunderland, MA). Bootstrap analysis was carried out using PAUP* as well. 
The possibility of recombination among strains was examined by using analysis of sequence 
variation using SIMPLOT (S.C. Ray) and analysis of phylogenetic heterogeneity by using 

30 MACCLADE (Sinauer). 

Analysis of this variation showed no evidence for major recombination events between 
the strains. There were no long stretches of polymorphic sites that strongly supported other trees 
(analysis with MACCLADE), and there were no significant crossover events in plots of sequence 
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similarity between strains (analysis with SIMPLOT). Some strain groupings (clades) generated 
by phylogenetic analysis were similar to clusters from the profile analysis (type III strains M781, 
M732 and COH1; type la strain 090 and nontypable strain CJB110), whereas others were 
different, possibly because of the aforementioned problems with the profile clustering. In both 
the phylogenetic analysis and the profile clustering, there is serotypedependent and -independent 
clustering (Figs. 3 and 5). The presence of strains of the same serotype in different clades or 
clusters could be due to lateral gene transfer. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on comparative 
genome hybridisations. The information on presence and absence of genes based on the 
microarray comparative genome hybridization results was used for phylogenetic profile analysis. 
The presence of a particular gene or gene cluster is indicated in the figure by a red square and the 
absence of a gene or cluster by a black square. The relationship between strains based on this 
analysis is depicted by the tree at the top of the figure. The strains and their serotypes are 
indicated (NT: nontypeable). Clusters with identical profiles are reduced to a single horizontal 
line and the number of genes in each cluster is indicated on the right. The clusters of 5 or more 
genes, labeled in red text and numbered, are listed in Table 6. The 1698 genes shared by all 19 
strains are labeled in green text. 

Figure 3 depicts a phylogenetic tree of GBS strains based on PGR sequences. The 
sequences of 19 genes (Table 7) from each of 1 1 GBS strains were aligned and trimmed to 
remove ambiguously aligned regions, and phylogenetic trees were inferred. Strain names are 
indicated in bold, and serotypes are indicated under the strain names. Bootstrap values are 
indicated on the branches. 

Techniques 

A summary of standard techniques and procedures which may be employed in order to 
perform the invention (e.g. to utilise the disclosed sequences for vaccination or diagnostic 
purposes) follows. This summary is not a limitation on the invention, but gives examples that 
may be used, but are not required. 

General 

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of 
molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. 
Such techniques are explained fully in the literature eg. Sambrook Molecular Cloning; A Laboratory 
Manual Second Edition (1989) or Third Edition (2000); DNA Cloning Volumes I and II (D.N Glover ed. 
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1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid Hybridization (B.D. Hames & S J. 
Higgins eds. 1984); Transcription and Translation (B.D. Hames & SJ. Higgins eds. 1984); Animal Cell 
Culture (R.I. Freshney ed. 1986); Immobilized Cells and Enzymes (IRL Press, 1986); B. Perbal, A Practical 
Guide to Molecular Cloning (1984); the Methods in Enzymology series (Academic Press, Inc.), especially 
5 volumes 154 & 155; Gene Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, 
Cold Spring Harbor Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and 
Molecular Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and 
Practice, Second Edition (Springer- Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-IV(DM. Weir and C. C. Blackwell eds 1986). 
1 0 Standard abbreviations for nucleotides and amino acids are used in this specification. 
Further Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the total X+Y in 
the composition is X. Preferably, X comprises at least about 90% by weight of the total of X+Y in the 
composition, more preferably at least about 95% or even 99% by weight. 
15 The term "comprising" means "including" as well as "consisting" e.g. a composition "comprising" X may 
consist exclusively of X or may include something additional e.g. X + Y. 

The singular forms "a", "and", and "the" include plural referents unless the context clearly dictates 
otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of such polynucleotides 
and reference to "an epithelial cell" includes reference to one or more cells and equivalents thereof known 

20 to those skilled in the art, etc. 

The term "heterologous" refers to two biological components that are not found together in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although the heterologous 
components are not found together in nature, they can function together, as when a promoter heterologous 
to a gene is operably linked to the gene. Another example is where a Streptococcal sequence is heterologous 

25 to a mouse host cell. A further examples would be two epitopes from the same or different proteins which 
have been assembled in a single protein in an arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous unit of 
polynucleotide replication within a cell, capable of replication under its own control. An origin of 
30 replication may be needed for a vector to replicate in a particular host cell. With certain origins of 
replication, an expression vector can be reproduced at a high copy number in the presence of the appropriate 

* 

proteins within the cell. Examples of origins are the autonomously replicating sequences, which are 
effective in yeast; and the viral T-antigen, effective in COS-7 cells. 
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A "mutant" sequence is defined as DNA 5 RNA or amino acid sequence differing from but having sequence 
identity with the native or disclosed sequence. Depending on the particular sequence, the degree of 
sequence identity between the native or disclosed sequence and the mutant sequence is preferably greater 
than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the Smith- Waterman algorithm 

5 as described above). As used herein, an "allelic variant" of a nucleic acid molecule, or region, for which 
nucleic acid sequence is provided herein is a nucleic acid molecule, or region, that occurs essentially at the 
same locus in the genome of another or second isolate, and that, due to natural variation caused by, for 
example, mutation or recombination, has a similar but not identical nucleic acid sequence. A coding region 
allelic variant typically encodes a protein having similar activity to that of the protein encoded by the gene 

10 to which it is being compared. An allelic variant can also comprise an alteration in the 5' or 3 5 untranslated 
regions of the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 
Expression systems 

The Streptococcal nucleotide sequences can be expressed in a variety of different expression systems; for 
example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 

15 i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA sequence capable 
of binding -mammalian RNA polymerase and initiating the downstream (3 ! ) transcription of a coding 
sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiating region, which is 
usually placed proximal to the 5 1 end of the coding sequence, and a TATA box, usually located 25-30 base 

20 pairs (bp) upstream of the transcription initiation site. The TATA box is thought to direct RNA polymerase 
II to begin RNA synthesis at the correct site. A mammalian promoter will also contain an upstream 
promoter element, usually located within 100 to 200 bp upstream of the TATA box. An upstream promoter 
element determines the rate at which transcription is initiated and can act in either orientation [Sambrook et 
al (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A Laboratory 

25 Manual, 2nd ed.J. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include the 
SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late promoter (Ad 
MLP), and herpes simplex virus promoter. In addition, sequences derived from non-viral genes, such as the 
30 murine metallotheionein gene, also provide useful promoter sequences. Expression may be either 
constitutive or regulated (inducible), depending on the promoter can be induced with glucocorticoid in 
hormone-responsive cells. 

The presence of an enhancer element (enhancer), combined with the promoter elements described above, 
will usually increase expression levels. An enhancer is a regulatory DNA sequence that can stimulate 
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transcription up to 1000-fold when linked to homologous or heterologous promoters, with synthesis 
beginning at the normal RNA start site. Enhancers are also active when they are placed upstream or 
downstream from the transcription initiation site, in either normal or flipped orientation, or at a distance of 
more than 1000 nucleotides from the promoter [Maniatis et al. (1987) Science 236:1237; Alberts et al. 

5 (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements derived from viruses may be particularly 
useful, because they usually have a broader host range. Examples include the SV40 early gene enhancer 
[Dijkema et al (1985) EMBO J. 4:761] and the enhancer/promoters derived from the long terminal repeat 
(LTR) of the Rous Sarcoma Virus [Gorman et al. (1982b) Proc. Natl Acad. Set 79:6111] and from human 
cytomegalovirus [Boshart et al (1985) Cell 42:521], Additionally, some enhancers are regulatable and 

10 become active only in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and 
Borelli (1986) Trends Genet. 2:215; Maniatis et al (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the recom- 
binant protein will always be a methionine, which is encoded by the ATG start codon. If desired, the N- 

15 terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric 
DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for 
secretion of the foreign protein in mammalian cells. Preferably, there are processing sites encoded between 
the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence 

20 fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion 
of the protein from the cell. The adenovirus triparite leader is an example of a leader sequence that provides 
for secretion of a foreign protein in mammalian cells. 

Usually, transcription termination and polyadenylation sequences recognized by mammalian cells are 
regulatory regions located 3' to the translation stop codon and thus, together with the promoter elements, 

25 flank the coding sequence. The 3' terminus of the mature mRNA is formed by site-specific post- 
transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 42:349; Proudfoot and Whitelaw 
(1988) "Termination and 3' end processing of eukaryotic RNA. In Transcription and splicing (ed. B.D. 
Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. Sci. 74:105]. These sequences direct the 
transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. Examples of 

30 transcription terminater/polyadenylation signals include those derived from SV40 [Sambrook et al (1989) 
"Expression of cloned genes in cultured mammalian cells." In Molecular Cloning: A Laboratory Manual]. 
Usually, the above described components, comprising a promoter, polyadenylation signal, and transcription 
termination sequence are put together into expression constructs. Enhancers, introns with functional splice 
donor and acceptor sites, and leader sequences may also be included in an expression construct, if desired. 
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Expression constructs are often maintained in a replicon, such as an extrachromosomal element (eg. 
plasmids) capable of stable maintenance in a host, such as mammalian cells or bacteria. Mammalian 
replication systems include those derived from animal viruses, which require trans-acting factors to 
replicate. For example, plasmids containing the replication systems of papovaviruses, such as SV40 
[Gluzman (1981) Cell 23:175] or polyomavirus, replicate to extremely high copy number in the presence of 
the appropriate viral T antigen. Additional examples of mammalian replicons include those derived from 
bovine papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a prokaryotic host 
for cloning and amplification. Examples of such mammalian-bacteria shuttle vectors include pMT2 
[Kaufman et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. (1986) Mol Cell Biol 5:1074]. 
The transformation procedure used depends upon the host to be transformed. Methods for introduction of 
heterologous polynucleotides into mammalian cells are known in the art and include dextran-mediated 
transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, 
electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA 
into nuclei. 

Mammalian cell lines available as hosts for expression are known in the art and include many immortalized 
cell lines available from the American Type Culture Collection (ATCC), including but not limited to, 
Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells 
(COS), human hepatocellular carcinoma cells (eg. Hep G2), and a number of other cell lines. 
ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, and is 
operably linked to the control elements within that vector. Vector construction employs techniques which 
are known in the art. Generally, the components of the expression system include a transfer vector, usually a 
bacterial plasmid, which contains both a fragment of the baculovirus genome, and a convenient restriction 
site for insertion of the heterologous gene or genes to be expressed; a wild type baculovirus with a sequence 
homologous to the baculovirus-specific fragment in the transfer vector (this allows for the homologous 
recombination of the heterologous gene in to the baculovirus genome); and appropriate insect host cells and 
growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the wild type 
viral genome are transfected into an insect host cell where the vector and viral genome are allowed to 
recombine. The packaged recombinant virus is expressed and recombinant plaques are identified and 
purified. Materials and methods for baculovirus/insect cell expression systems are commercially available 
in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). These techniques are generally 
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known to those skilled in the art and fully described in Summers & Smith, Texas Agricultural Experiment 
Station Bulletin No. 1555 (1987) ("Summers & Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above described 
components, comprising a promoter, leader (if desired), coding sequence, and transcription termination 
5 sequence, are usually assembled into an intermediate transplacement construct (transfer vector). This may 
contain a single gene and operably linked regulatory elements; multiple genes, each with its owned set of 
operably linked regulatory elements; or multiple genes, regulated by the same set of regulatory elements. 
Intermediate transplacement constructs are often maintained in a replicon, such as an extra-chromosomal 
element {e.g. plasmids) capable of stable maintenance in a host, such as a bacterium. The replicon will have 

10 a replication system, thus allowing it to be maintained in a suitable host for cloning and amplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is pAc373. 
Many other vectors, known to those of skill in the art, have also been designed. These include, for example, 
pVL985 (which alters the polyhedrin start codon from ATG to ATT, and which introduces a BamHI 
cloning site 32 basepairs downstream from the ATT; see Luckow and Summers, Virology (1989) 77:31. 

15 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. Rev. 
Microbiol, 42:117) and a prokaryotic ampicillin-resistance (amp) gene and origin of replication for, 
selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any DNA 
sequence capable of binding a baculovirus RNA polymerase and initiating the downstream (5 f to 3') 

20 transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription 
initiation region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. A 
baculovirus transfer vector may also have a second domain called an enhancer, which, if present, is usually • 
distal to the structural gene. Expression may be either regulated or constitutive. 

25 Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly useful 
promoter sequences. Examples include sequences derived from the gene encoding the viral polyhedron 
protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: The Molecular Biology 
of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 476; and the gene encoding the 
plO protein, Vlak et al, (1988), J. Gen. Virol. 69:765. 

30 DNA encoding suitable signal sequences can be derived from genes for secreted insect or baculovirus 
proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 73:409). Alternatively, 
since the signals for mammalian cell posttranslational modifications (such as signal peptide cleavage, 
proteolytic cleavage, and phosphorylation) appear to be recognized by insect cells, and the signals required 
for secretion and nuclear accumulation also appear to be conserved between the invertebrate cells and 
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vertebrate cells, leaders of non-insect origin, such as those derived from genes encoding human cc- 
interferon, Maeda et al, (1985), Nature 315:592; human gastrin-releasing peptide, Lebacq-Verheyden et al, 
(1988), Molec. Cell Biol S:3129; human IL-2, Smith et al., (1985) Proc> Natl Acad. Set USA, 52:8404; 
mouse IL-3, (Miyajima et al., (1987) Gene 55:273; and human glucocerebrosidase, Martin et al. (1988) 

5 DNA, 7:99, can also be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed with the 
proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused foreign proteins 
usually requires heterologous genes that ideally have a short leader sequence containing suitable translation 
initiation signals preceding an ATG start signal. If desired, methionine at the N-terminus may be cleaved 

10 from the mature protein by in vifro incubation with cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted from 
the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised of a leader 
sequence fragment that provides for secretion of the foreign protein in insects. The leader sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the 

1 5 translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor of the 
protein, an insect cell host is co-transformed with the heterologous DNA of the transfer vector and the 
genomic DNA of wild type baculovirus - usually by co-transfection. The promoter and transcription 
termination sequence of the construct will usually comprise a 2-5kb section of the baculovirus genome. 

20 Methods for introducing heterologous DNA into the desired site in the baculovirus virus are known in the 
art. (See Summers & Smith supra; Ju et al. (1987); Smith et al, Mol Cell Biol (1983) 3:2156; and Luckow 
and Summers (1989)). For example, the insertion can be into a gene such as the polyhedrin gene, by 
homologous double crossover recombination; insertion can also be into a restriction enzyme site engineered 
into the desired baculovirus gene. Miller et al., (1989), Bioessays 4:91. The DNA sequence, when cloned in 

25 place of the polyhedrin gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific 
sequences and is positioned downstream of the polyhedrin promoter, 

The newly formed baculovirus expression vector is subsequently packaged into an infectious recombinant 
baculovirus. Homologous recombination occurs at low frequency (between about 1% and about 5%); thus, 
the majority of the virus produced after cotransfection is still wild-type virus. Therefore, a method is 
30 necessary to identify recombinant viruses. An advantage of the expression system is a visual screen 
allowing recombinant viruses to be distinguished. The polyhedrin protein, which is produced by the native 
virus, is produced at very high levels in the nuclei of infected cells at late times after viral infection. 
Accumulated polyhedrin protein forms occlusion bodies that also contain embedded particles. These 
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occlusion bodies, up to 15 pm in size, are highly retractile, giving them a bright shiny appearance that is 
readily visualized under the light microscope. Cells infected with recombinant viruses lack occlusion 
bodies. To distinguish recombinant virus from wild-type virus, the transfection supernatant is plaqued onto 
a monolayer of insect cells by techniques known to those skilled in the art. Namely, the plaques are 
5 screened under the light microscope for the presence (indicative of wild-type virus) or absence (indicative 
of recombinant virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) 
at 16.8 (Supp. 10, 1990); Summers & Smith, supra; Miller et al. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect cells. For 
example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti , Autographa 

10 calif ornica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and Trichoplusia ni (WO 
89/046699; Carbonell et al., (1985) J. Virol 55:153; Wright (1986) Nature 321:718; Smith et al, (1983) 
Mol Cell Biol 3:2156; and see generally, Fraser, et al (1989) In Vitro Cell Dev. Biol 25:225). 
Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally known to 

1 5 those skilled in the art. See, eg. Summers & Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for stable 
maintenance of the plasmid(s) present in the modified insect host. Where the expression product gene is 
under inducible control, the host may be grown to high density, and expression induced. Alternatively, 
where expression is constitutive, the product will be continuously expressed into the medium and the 

20 nutrient medium must be continuously circulated, while removing the product of interest and augmenting 
depleted nutrients. The product may be purified by such techniques as chromatography, eg. HPLC, affinity 
chromatography, ion exchange chromatography, etc.; electrophoresis; density gradient centrifugation; 
solvent extraction, etc. As appropriate, the product may be further purified, as required, so as to remove 
substantially any insect proteins which are also present in the medium, so as to provide a product which is at 

25 least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are incubated 
under conditions which allow expression of the recombinant protein encoding sequence. These conditions 
will vary, dependent upon the host cell selected. However, the conditions are readily ascertainable to those 
of ordinary skill in the art, based upon what is known in the art. 

30 iii. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. Exemplary 
plant cellular genetic expression systems include those described in patents, such as: US 5,693,506; US 
5,659,122; and US 5,608,143. Additional examples of genetic expression in plant cell culture has been 
described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions of plant protein signal peptides may 
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be found in addition to the references described above in Vaulcombe et al., Mol Gen. Genet. 209:33-40 
(1987); Chandler et al, Plant Molecular Biology 3:407-418 (1984); Rogers, J. Biol Chem. 260:3731-3738 
(1985); Rothstein et al, Gene 55:353-356 (1987); Whittier et al, Nucleic Acids Research 15:2515-2535 
(1987); Wirsel et al., Molecular Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A 
description of the regulation of plant gene expression by the phytohormone, gibberellic acid and secreted 
enzymes induced by gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: 
Advanced Plant Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21- 
52. References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027-1038(1990); 
Maas et al., EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proa Natl Acad. Sci 84:1337-1339 (1987). 
Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
expression cassette comprising genetic regulatory elements designed for operation in plants. The expression 
cassette is inserted into a desired expression vector with companion sequences upstream and downstream 
from the expression cassette suitable for expression in a plant host. The companion sequences will be of 
plasmid or viral origin and provide necessary characteristics to the vector to permit the vectors to move 
DNA from an original cloning host, such as bacteria, to the desired plant host. The basic bacterial/plant 
vector construct will preferably provide a broad host range prokaryote replication origin; a prokaryote 
selectable marker; and, for Agrobacterium transformations, T DNA sequences for Agrobacterium-mediated 
transfer to plant chromosomes. Where the heterologous gene is not readily amenable to detection, the 
construct will preferably also have a selectable marker gene suitable for determining if a plant cell has been 
transformed. A general review of suitable markers, for example for the members of the grass family, is 
found in Wilmink and Dons, 1993, Plant Mol Biol Reptr, 1 1(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome are also 
recommended. These might include transposon sequences and the like for homologous recombination as 
well as Ti sequences which permit random insertion of a heterologous expression cassette into a plant 
genome. Suitable prokaryote selectable markers include resistance toward antibiotics such as ampicillin or 
tetracycline. Other DNA sequences encoding additional functions may also be present in the vector, as is 
known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette for 
expression of the protein(s) of interest. Usually, there will be only one expression cassette, although two or 
more are feasible. The recombinant expression cassette will contain in addition to the heterologous protein 
encoding sequence the following elements, a promoter region, plant 5' untranslated sequences, initiation 
codon depending upon whether or not the structural gene comes equipped with one, and a transcription and 
translation termination sequence. Unique restriction enzyme sites at the 5' and 3' ends of the cassette allow 
for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The sequence 
encoding the protein of interest will encode a signal peptide which allows processing and translocation of 
the protein, as appropriate, and will usually lack any sequence which might result in the binding of the 
desired protein of the invention to a membrane. Since, for the most part, the transcriptional initiation region 

5 will be for a gene which is expressed and translocated during germination, by employing the signal peptide 
which provides for translocation, one may also provide for translocation of the protein of interest. In this 
way, the protein(s) of interest will be translocated from the cells in which they are expressed and may be 
efficiently harvested. Typically secretion in seeds are across the aleurone or scutellar epithelium layer into 
the endosperm of the seed. While it is not required that the protein be secreted from the cells in which the 

10 protein is produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable to 
determine whether any portion of the cloned gene contains sequences which will be processed out as introns 
by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" region may be 
conducted to prevent losing a portion of the genetic message as a false intron code, Reed and Maniatis, Cell 

15 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically transfer the 
recombinant DNA. Crossway, Mol. Gen. Genet, 202:179-185, 1985. The genetic material may also be 
transferred into the plant cell by using polyethylene glycol, Krens, et al, Nature, 296, 72-74, 1982. Another 
method of introduction of nucleic acid segments is high velocity ballistic penetration by small particles with 

20 the nucleic acid either within the matrix of small beads or particles, or on the surface, Klein, et al., Nature, 
327, 70-73, 1987 and Knudsen and Muller, 1991, Planta, 185:330-336 teaching particle bombardment of 
barley endosperm to create transgenic barley. "Yet another method of introduction would be fusion of 
protoplasts with other entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, 
Fraley, et al, Proc. Natl Acad. Set USA, 79, 1859-1863, 1982. 

25 The vector may also be introduced into the plant cells by electroporation. (Fromm et al, Proc. Natl Acad. 
Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the presence of plasmids 
containing the gene construct. Electrical impulses of high field strength reversibly permeabilize 
biomembranes allowing the introduction of the plasmids. Electroporated plant protoplasts reform the cell 
wall, divide, and form plant callus. 

30 All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can be 
transformed by the present invention so that whole plants are recovered which contain the transferred gene. 
It is known that practically all plants can be regenerated from cultured cells or tissues, including but not 
limited to all major species of sugarcane, sugar beet, cotton, fruit and other trees, legumes and vegetables. 
Some suitable plants include, for example, species from the genera Fragaria, Lotus, Medicago, Onobrychis, 
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Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, 
Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersion, Nicotiana, Solarium, Petunia, 
Digitalis, Majorana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, 
Glycine, Lolium, Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of transformed 
protoplasts containing copies of the heterologous gene is first provided. Callus tissue is formed and shoots 
may be induced from callus and subsequently rooted. Alternatively, embryo formation can be induced from 
the protoplast suspension. These embryos germinate as natural embryos to form plants. The culture media 
will generally contain various amino acids and hormones, such as auxin and cytokinins. It is also 
advantageous to add glutamic acid and proline to the medium, especially for such species as corn and 
alfalfa. Shoots and roots normally develop simultaneously. Efficient regeneration will depend on the 
medium, on the genotype, and on the history of the culture. If these three variables are controlled, then 
regeneration is fully reproducible and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or alternatively, the 
protein may be extracted from the whole plant. Where the desired protein of the invention is secreted into 
the medium, it may be collected. Alternatively, the embryos and embryoless-half seeds or other plant tissue 
may be mechanically disrupted to release any secreted protein between cells and tissues. The mixture may 
be suspended in a buffer solution to retrieve soluble proteins. Conventional protein isolation and 
purification methods will be then used to purify the recombinant protein. Parameters of time, temperature 
pH, oxygen, and volumes will be adjusted through routine methods to optimize expression and recovery of 
heterologous protein. 
iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence capable of 
binding bacterial RNA polymerase and initiating the downstream (3') transcription of a coding sequence 
(eg. structural gene) into mRNA. A promoter will have a transcription initiation region which is usually 
placed proximal to the 5 ? end of the coding sequence. This transcription initiation region usually includes an 
RNA polymerase binding site and a transcription initiation site. A bacterial promoter may also have a 
second domain called an operator, that may overlap an adjacent RNA polymerase binding site at which 
RNA synthesis begins. The operator permits negative regulated (inducible) transcription, as a gene 
repressor protein may bind the operator and thereby inhibit transcription of a specific gene. Constitutive 
expression may occur in the absence of negative regulatory elements, such as the operator. In addition, 
positive regulation may be achieved by a gene activator protein binding sequence, which, if present is 
usually proximal (5') to the RNA polymerase binding sequence. An example of a gene activator protein is 
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the catabolite activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli 
(E. coli) [Raibaud et ah (1984) Anna, Rev. Genet. 18:113], Regulated expression may therefore be either 
positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples 
5 include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose {lac) 
[Chang et al. (1977) Nature 198:1056], and maltose. Additional examples include promoter sequences 
derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et ah (1980) Nuc. Acids Res. S:4057; 
Yelverton et ah (1981) Nucl Acids Res. 9:731; US patent 4,738,921; EP-A-0036776 and EP-A-0121775]. 
The g-laotamase (bid) promoter system [Weissmann (1981) "The cloning of interferon and other mistakes." 
10 In Interferon 3 (ed. I. Gresser)], bacteriophage lambda PL [Shimatake et ah (1981) Nature 292:128] and T5 
[US patent 4,689,406] promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. For 
example, transcription activation sequences of one bacterial or bacteriophage promoter may be joined with 
the operon sequences of another bacterial or bacteriophage promoter, creating a synthetic hybrid promoter 

15 [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac promoter comprised of both trp 
promoter and lac operon sequences that is regulated by the lac repressor [Amann et al. (1983) Gene 25:167; 
de Boer et ah (1983) Proc. Natl. Acad. Set §0:21]. Furthermore, a bacterial promoter can include naturally 
occurring promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a 

20 compatible RNA polymerase to produce high levels of expression of some genes in prokaryotes. The 
bacteriophage T7 RNA polymerase/promoter system is an example of a coupled promoter system [Studier 
et ah (1986) J. Mol. Biol. 189:1 13; Tabor et al. (1985) Proc Natl Acad. Sci. 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO-A-0 267 
851). 

25 In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for the 
expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the Shine- 

Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 nucleotides in length 

i 

located 3-11 nucleotides upstream of the initiation codon [Shine et ah (1975) Nature 254:34]. The SD 
sequence is thought to promote binding of mRNA to the ribosome by the pairing of bases between the SD 
30 sequence and the 3 f and of E. coli 16S rRNA [Steitz et ah (1979) "Genetic signals and nucleotide sequences 
in messenger RNA." In Biological Regulation and Development: Gene Expression (ed. R.F. Goldberger)] . 
To express eukaryotic genes and prokaryotic genes with weak ribosome-binding site [Sambrook et ah 
(1989) "Expression of cloned genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 
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A DNA molecule may be expressed intracellular^. A promoter sequence may be directly linked with the 
DNA molecule, in which case the first amino acid at the N-terminus will always be a methionine, which is 
encoded by the ATG start codon. If desired, methionine at the N-terminus may be cleaved from the protein 
by in vitro incubation with cyanogen bromide or by either in vivo on in vitro incubation with a bacterial 

5 methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the N- 
terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two amino acid 
sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' terminus of a foreign 

10 gene and expressed in bacteria. The resulting fusion protein preferably retains a site for a processing 
enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene [Nagai et al (1984) Nature 
309:810]. Fusion proteins can also be made with sequences from the lacL [Jia et al. (1987) Gene 50:197], 
trpE [Allen et al (1987) J. Biotechnol. 5:93; Makoff et al (1989) J. Gen. Microbiol 735:11], and Chey 
[EP-A-0 324 647] genes. The DNA sequence at the junction of the two amino acid sequences may or may 

15 not encode a cleavable site. Another example is a ubiquitin fusion protein. Such a fusion protein is made 
with the ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin specific 
processing-protease) to cleave the ubiquitin from the foreign protein. Through this method, native foreign 
protein can be isolated [Miller et al (1989) Bio/Technology 7:698]. 

Alternatively,, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules that 
20 encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion of the 
foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes a signal 
peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. The 
protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, 
located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably there are 
25 processing sites, which can be cleaved either in vivo or in vitro encoded between the signal peptide 
fragment and the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, such as 
the E. coli outer membrane protein gene (ompA) [Masui et al (1983), in: Experimental Manipulation of 
Gene Expression; Ghrayeb et al (1984) EMBO J. 3:2437] and the E. coli alkaline phosphatase signal 
30 sequence (phoA) [Oka et al (1985) Proc. Natl Acad. Sci. 52:7212]. As an additional example, the signal 
sequence of the alpha-amylase gene from various Bacillus strains can be used to secrete heterologous 
proteins from 5. subtilis [Palva et al (1982) Proc. Natl Acad. Sci. USA 79:5582; EP-A-0 244 042]. 
Usually, transcription termination sequences recognized by bacteria are regulatory regions located 3' to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences 
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direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. 
Transcription termination sequences frequently include DNA sequences of about 50 nucleotides capable of 
forming stem loop structures that aid in terminating transcription. Examples include transcription 
termination sequences derived from genes with strong promoters, such as the trp gene in E. coli as well as 
5 other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression constructs. 
Expression constructs are often maintained in a replicon, such as an extrachromosomal element {eg. 
plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will have a replication 

10 system, thus allowing it to be maintained in a prokaryotic host either for expression or for cloning and 
amplification. In addition, a replicon may be either a high or low copy number plasmid. A high copy 
number plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 
to about 150. A host containing a high copy number plasmid will preferably contain at least about 10, and 
more preferably at least about 20 plasmids. Either a high or low copy number vector may be selected, 

1 5 depending upon the effect of the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to the bacterial chromosome 
that allows the vector to integrate. Integrations appear to result from recombinations between homologous 
DNA in the vector and the bacterial chromosome. For example, integrating vectors constructed with DNA 

20 from various Bacillus strains integrate into the Bacillus chromosome (EP-A- 0 127 328). Integrating vectors 
may also be comprised of bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow 
for the selection of bacterial strains that have been transformed. Selectable markers can be expressed in the 
bacterial host and may include genes which render bacteria resistant to drugs such as ampicillin, 
25 chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline [Davies et ah (1978) Annu. Rev. 
Microbiol. 32:469]. Selectable markers may also include biosynthetic genes, such as those in the histidine, 
tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation vectors. 
Transformation vectors are usually comprised of a selectable market that is either maintained in a replicon 
30 or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, have 
been developed for transformation into many bacteria. For example, expression vectors have been 
developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et ah (1982) Proc. Natl. Acad. Sci. 
USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia coli [Shimatake et ah 
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(1981) Nature 292:12%; Amann et al (1985) Gene 40:183; Studier et al (1986) 1 Mol Biol 189:113; EP- 
A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], Streptococcus cremoris [Powell et al (1988) Appl 
Environ, Microbiol 54:655]; Streptococcus lividans [Powell et al (1988) Appl Environ. Microbiol 
54:655], Streptomyces lividans [US patent 4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually include 
either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent cations and DMSO. 
DNA can also be introduced into bacterial cells by electroporation. Transformation procedures usually vary 
with the bacterial species to be transformed. See eg. [Masson et al (1989) FEMS Microbiol Lett. 60:273; 
Palva et al (1982) Proc. Natl Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 
84/04541, Bacillus], [Miller et al (1988) Proc. Natl Acad. Sci 55:856; Wang et al (1990) J. Bacteriol 
772:949, Campylobacter], [Cohen et al (1973) Proc. Natl Acad. Sci. 69:2110; Dower et al (1988) Nucleic 
Acids Res. 16:6127; Kushner (1978) "An improved method for transformation of Escherichia coli with 
ColEl -derived plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) J. Mol Biol 53:159; Taketo (1988) 
Biochim. Biophys. Acta 949:318; Escherichia], [Chassy et al (1987) FEMS Microbiol Lett 44:173 
Lactobacillus]; [Fiedler et al (1988) Anal Biochem 1 70:38, Pseudomonas]; [Augustin et al (1990) FEMS 
Microbiol Lett. 66:203, Staphylococcus], [Barany et al (1980) «/. Bacteriol 144:6%; Harlander (1987) 
"Transformation of Streptococcus lactis by electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. 
Curtiss III); Perry et al (1981) Infect. Immun. 32:1295; Powell et al (1988) Appl Environ. Microbiol. 
54:655; Somkuti etal (1987) Proc. 4th Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any DNA 
sequence capable of binding yeast KNA polymerase and initiating the downstream (3') transcription of a 
coding sequence {eg. structural gene) into mRNA. A promoter will have a transcription initiation region 
which is usually placed proximal to the 5' end of the coding sequence. This transcription initiation region 
usually includes an RNA polymerase binding site (the "TATA Box") and a transcription initiation site. A 
yeast promoter may also have a second domain called an upstream activator sequence (UAS), which, if 
present, is usually distal to the structural gene. The UAS permits regulated (inducible) expression. Constitu- 
tive expression occurs in the absence of a UAS. Regulated expression may be either positive or negative, 
thereby either enhancing or reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding enzymes in 
the metabolic pathway provide particularly useful promoter sequences. Examples include alcohol 
dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6-phosphate isomerase, 
glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, phosphofructokinase, 3- 
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phosphoglycerate mutase, and pyruvate kinase (PyK) (EPOA-0 329 203). The yeast PH05 gene, encoding 
acid phosphatase, also provides useful promoter sequences [Myanohara et al (1983) Proc. Natl. Acad. Sci. 
USA 80:1]. 

In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation region of 
another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid promoters include 
the ADH regulatory sequence linked to the GAP transcription activation region (US Patent Nos. 4,876,197 
and 4,880,734). Other examples of hybrid promoters include promoters which consist of the regulatory 
sequences of either the ADH2, GALA, GAL 10, OR PHOS genes, combined with the transcriptional 
activation region of a glycolytic enzyme gene such as GAP or PyK (EP-A-0 164 556). Furthermore, a yeast 
promoter can include naturally occurring promoters of non-yeast origin that have the ability to bind yeast 
RNA polymerase and initiate transcription. Examples of such promoters include, inter alia, [Cohen et al 
(1980) Proc. Natl Acad. Set USA 77:1078; Henikoff et al (1981) Nature 253:835; Hollenberg et al (1981) 
Curr. Topics Microbiol Immunol 96:119; Hollenberg et al (1979) "The Expression of Bacterial Antibiotic 
Resistance Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al (1980) Gene 11:163; 
Panthier etal (1980) Curr. Genet. 2:109;]. 

A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly linked 
with the DNA molecule, in which case the first amino acid at the N-terminus of the recombinant protein 
will always be a methionine, which is encoded by the ATG start codon. If desired, methionine at the N- 
terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 
Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, baculovirus, 
and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal portion of an 
endogenous yeast protein, or other stable protein, is fused to the 5' end of heterologous coding sequences. 
Upon expression, this construct will provide a fusion of the two amino acid sequences. For example, the 
yeast or human superoxide dismutase (SOD) gene, can be linked at the 5' terminus of a foreign gene and 
expressed in yeast. The DNA sequence at the junction of the two amino acid sequences may or may not 
encode a cleavable site. See eg. EP-A-0 196 056. Another example is a ubiquitin fusion protein. Such a 
fusion protein is made with the ubiquitin region that preferably retains a site for a processing enzyme (eg. 
ubiquitin-specific processing protease) to cleave the ubiquitin from the foreign protein. Through this 
method, therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric 
DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provide for 
secretion in yeast of the foreign protein. Preferably, there are processing sites encoded between the leader 



59 



WO 2004/018646 



PCT/US2003/026827 



fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence fragment 
usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the 
protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, such as the 
yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US patent 4,588,684). 
Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that also provide for secretion 
in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor gene, which 
contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor fragments that can be 
employed include the full-length pre-pro alpha factor leader (about 83 amino acid residues) as well as 
truncated alpha-factor leaders (usually about 25 to about 50 amino acid residues) (US Patents 4,546,083 and 
4,870,008; EP-A-0 324 274). Additional leaders employing an alpha-factor leader fragment that provides 
for secretion include hybrid alpha-factor leaders made with a presequence of a first yeast, but a pro-region 
from a second yeast alphafactor. (eg. see WO 89/02463.) 

Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences 
direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. 
Examples of transcription terminator sequence and other yeast-recognized termination sequences, such as 
those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding sequence of 
interest, and transcription termination sequence, are put together into expression constructs. Expression 
constructs are often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) capable 
of stable maintenance in a host, such as yeast or bacteria. The replicon may have two replication systems, 
thus allowing it to be maintained, for example, in yeast for expression and in a prokaryotic host for cloning 
and amplification. Examples of such yeast-bacteria shuttle vectors include YEp24 [Botstein et al (1979) 
Gene 8:17-24], pCl/1 [Brake et al (1984) Proc. Natl Acad. Sci USA 8i:4642-4646], and YRpl7 
[Stinchcomb et al (1982) J. Mol Biol 158:151]. In addition, a replicon may be either a high or low copy 
number plasmid. A high copy number plasmid will generally have a copy number ranging from about 5 to 
about 200, and usually about 10 to about 150. A host containing a high copy number plasmid will 
preferably have at least about 10, and more preferably at least about 20. Enter a high or low copy number 
vector may be selected, depending upon the effect of the vector and the foreign protein on the host. See eg. 
Brake et al, supra. 

Alternatively, the expression constructs can be integrated into the yeast genome with an integrating vector. 
Integrating vectors usually contain at least one sequence homologous to a yeast chromosome that allows the 
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vector to integrate, and preferably contain two homologous sequences flanking the expression construct. 
Integrations appear to result from recombinations between homologous DNA in the vector and the yeast 
chromosome [Orr- Weaver et al (1983) Methods in Enzymol 707:228-245]. An integrating vector may be 
directed to a specific locus in yeast by selecting the appropriate homologous sequence for inclusion in the 

5 vector. See Orr- Weaver et al, supra. One or more expression construct may integrate, possibly affecting 
levels of recombinant protein produced [Rine et al (1983) Proc. Natl Acad. Sci USA 80:6750]. The 
chromosomal sequences included in the vector can occur either as a single segment in the vector, which 
results in the integration of the entire vector, or two segments homologous to adjacent segments in the 
chromosome and flanking the expression construct in the vector, which can result in the stable integration 

10 of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow 
for the selection of yeast strains that have been transformed. Selectable markers may include biosynthetic 
genes that can be expressed in the yeast host, such as ADE2, HIS4, LEW, TRP1, and ALG7, and the G418 
resistance gene, which confer resistance in yeast cells to tunicamycin and G418, respectively. In addition, a 

15 suitable selectable marker may also provide yeast with the ability to grow in the presence of toxic 
compounds, such as metal. For example, the presence of CUP1 allows yeast to grow in the presence of 
copper ions [Butt et al (1987) Microbiol Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation vectors. 
Transformation vectors are usually comprised of a selectable marker that is either maintained in a replicon 

20 or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, have been 
developed for transformation into many yeasts. For example, expression vectors have been developed for, 
inter alia, the following yeasts: Candida albicans [Kurtz, et al (1986) Mol Cell Biol 6:142], Candida 
maltosa [Kunze, et al (1985) J. Basic Microbiol 25:141]. Hansenula polymorphs [Gleeson, et al (1986) J. 

25 Gen. Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet. 202:302], Kluyveromyces fragilis 
[Das, et al (1984) J. Bacteriol 755:1165], Kluyveromyces lactis [De Louvencourt et al (1983) J. 
Bacteriol 154:131; Van den Berg et al (1990) Bio/Technology 8:135], Pichia guillerimondii [Kunze et al 
(1985) J. Basic Microbiol 25:141], Pichia pastoris [Cregg, et al (1985) Mol Cell Biol 5:3376; US Patent 
Nos. 4,837,148 and 4,929,555], Saccharomyces cerevisiae [Hinnen et al (1978) Proc. Natl Acad. Sci. USA 

30 75:1929; Ito et al. (1983) J. Bacteriol. 753:163], Schizosaccharomyces pombe [Beach and Nurse (1981) 
Nature 300:106], and Yarrowia lipolytica [Davidow, et al (1985) Curr. Genet. 70:380471 Gaillardin, et al 
(1985) Curr, Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually include 
either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. Transformation 
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procedures usually vary with the yeast species to be transformed. See eg. [Kurtz et al (1986) Mol Cell 
Biol 6:142; Kunze et al (1985) J. Basic Microbiol 25:141; Candida]; [Gleeson et al (1986) J. Gen. 
Microbiol 132:3459; Roggenkamp et al (1986) Mol Gen. Genet 202:302; Hansenula]; [Das et al (1984) 
J. Bacterid 158:1165; De Louvencourt et al. (1983) J. Bacteriol 154:1165; Van den Berg et al. (1990) 
Bio/Technology 8:135; Kluyveromyces]; [Cregg et al (1985) Mol. Cell Biol 5:3376; Kunze et al (1985) J. 
Basic Microbiol 25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia]; [ffinnen et al (1978) Proc. 
Natl Acad. Sci. USA 75;1929; Ito et al (1983) J. Bacteriol 153:163 Saccharomyces]; [Beach and Nurse 
(1981) Nature 300:706; Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet. 10:39; Gaillardin et al 
(1985) Curr. Genet. 10:49; Yarrowia]. 
Antibodies 

As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of at least 
one antibody combining site. An "antibody combining site" is the three-dimensional binding space with an 
internal surface shape and charge distribution complementary to the features of an epitope of an antigen, 
which allows a binding of the antibody with the antigen. "Antibody" includes, for example, vertebrate 
antibodies, hybrid antibodies, chimeric antibodies, humanised antibodies, altered antibodies, univalent 
antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, immunoassays, and 
distinguishing/identifying Streptococcal proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably a 
mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the 
volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies. 
Immunization is generally performed by mixing or emulsifying the protein in saline, preferably in an 
adjuvant such as Freund's complete adjuvant, and injecting the mixture or emulsion parenterally (generally 
subcutaneously or intramuscularly). A dose of 50-200 |ig/injection is typically sufficient. Immunization is 
generally boosted 2-6 weeks later with one or more injections of the protein in saline, preferably using 
Freund's incomplete adjuvant. One may alternatively generate antibodies by in vitro immunization using 
methods known in the art, which for the purposes of this invention is considered equivalent to in vivo 
immunization. Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic 
container, incubating the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The 
serum is recovered by centrifugation (eg. l 5 000g for 10 minutes). About 20-50 ml per bleed may be 
obtained from rabbits. 
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Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature (1975) 
256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described above. 
However, rather than bleeding the animal to extract serum, the spleen (and optionally several large lymph 
nodes) is removed and dissociated into single cells. If desired, the spleen cells may be screened (after 
removal of nonspecifically adherent cells) by applying a cell suspension to a plate or well coated with the 
protein antigen. B-cells expressing membrane-bound immunoglobulin specific for the antigen bind to the 
plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or all dissociated spleen 
cells, are then induced to fuse with myeloma cells to form hybridomas, and are cultured in a selective 
medium (eg. hypoxanthine, aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated 
by limiting dilution, and are assayed for production of antibodies which bind specifically to the immunizing 
antigen (and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 
cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 
If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 

32 

techniques. Suitable labels include fluorophores, chrqmophores, radioactive atoms (particularly P and 
125 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes are typically 
detected by their activity. For example, horseradish peroxidase is usually detected by its ability to convert 
3,3^5,5-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a spectrophotometer. "Specific 
binding partner" refers to a protein capable of binding a ligand molecule with high specificity, as for 
example in the case of an antigen and a monoclonal antibody specific therefor. Other specific binding 
partners include biotin and avidin or streptavidin, IgG and protein A, and the numerous receptor-ligand 
couples known in the art. It should be understood that the above description is not meant to categorize the 

125 

various labels into distinct classes, as the same label may serve in several different modes. For example, I 
may serve as a radioactive label or as an electron-dense reagent. HRP may serve as enzyme or as antigen for 
a MAb. Further, one may combine various labels for desired effect. For example, MAbs and avidin also 
require labels in the practice of this invention: thus, one might label a MAb with biotin, and detect its 
presence with avidin labeled with 125 I, or with an anti-biotin MAb labeled with HRP. Other permutations 
and possibilities will be readily apparent to those of ordinary skill in the art, and are considered as 
equivalents within the scope of the instant invention. 
Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the invention. 
The pharmaceutical compositions will comprise a therapeutically effective amount of either polypeptides, 
antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic agent to 
treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable therapeutic or 
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preventative effect. The effect can be detected by, for example, chemical markers or antigen levels. 
Therapeutic effects also include reduction in physical symptoms, such as decreased body temperature. The 
precise effective amount for a subject will depend upon the subject's size and health, the nature and extent 
of the condition, and the therapeutics or combination of therapeutics selected for administration. Thus, it is 

5 not useful to specify an exact effective amount in advance. However, the effective amount for a given 
situation can be determined by routine experimentation and is within the judgement of the clinician. 
For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg or 0.05 
mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 
A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 

10 "pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such as 
antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any pharmaceutical 
carrier that does not itself induce the production of antibodies harmful to the individual receiving the 
composition, and which may be administered without undue toxicity. Suitable carriers may be large, slowly 
metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, 

15 polymeric amino acids, amino acid copolymers, and inactive virus particles. Such carriers are well known to 
those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids such as 
acetates, propionates, malonates, benzoates, and the like. A thorough discussion of pharmaceutically 

20 acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack Pub. Co., N J. 1991). 

Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, saline, 
glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic compositions 
are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or 

25 suspension in, liquid vehicles prior to injection may also be prepared. Liposomes are included within the 
definition of a pharmaceutically acceptable carrier. 
Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The subjects 
to be treated can be animals; in particular, human subjects can be treated. 
30 Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, 
intraperitoneally, intravenously or intramuscularly or delivered to the interstitial space of a tissue. The 
compositions can also be administered into a lesion. Other modes of administration include oral and 
pulmonary administration, suppositories, and transdermal or transcutaneous applications (eg. see 
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WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or a 
multiple dose schedule. 

See also Delivery Strategies for Antisense Oligonucleotide Therapeutics (ed. Akhtar) ISBN 0849347785. 
Vaccines 

5 Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or therapeutic (ie. 
to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
usually in combination with "pharmaceutical^ acceptable carriers" which include any carrier that does not 
itself induce the production of antibodies harmful to the individual receiving the composition. Suitable 

10 carriers are typically large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, lipid aggregates (such 
as oil droplets or liposomes), and inactive virus particles. Such carriers are well known to those of ordinary 
skill in the art. Additionally, these carriers may function as immunostimulating agents ("adjuvants"). 
Furthermore, the antigen or immunogen may be conjugated to a bacterial toxoid, such as a toxoid from 

15 diphtheria, tetanus, cholera, K pylori, etc. pathogens. 

Vaccines of the invention may be administered in conjunction with other immunoregulatory 
agents. In particular, compositions will usually include an adjuvant. 

Preferred further adjuvants include, but are not limited to, one or more of the following set forth 
below: 

20 A. Mineral Containing Compositions 

Mineral containing compositions suitable for use as adjuvants in the invention include mineral 
salts, such as aluminium salts and calcium salts. The invention includes mineral salts such as 
hydroxides (e.g. oxyhydroxides), phosphates (e.g. hydroxyphoshpates, orthophosphates), 
sulphates, etc. {e.g. see chapters 8 & 9 of ref. 1}), or mixtures of different mineral compounds, 
25 with the compounds taking any suitable form (e.g. gel, crystalline, amorphous, etc.), and with 
adsorption being preferred. The mineral containing compositions may also be formulated as a 
particle of metal salt. See ref. 2. 

B. Oil-Emulsions 

Oil-emulsion compositions suitable for use as adjuvants in the invention include squalene-water 
30 emulsions, such as MF59 (5% Squalene, 0.5% Tween 80, and 0.5% Span 85, formulated into 
submicron particles using a microfluidizer). See ref. 3. 

Complete Freund's adjuvant (CFA) and incomplete Freund's adjuvant (IF A) may also be used as 
adjuvants in the invention. 

65 



WO 2004/018646 



PCT/US2003/026827 



C. Saponin Formulations 

Saponin formulations, may also be used as adjuvants in the invention. Saponins are a 
heterologous group of sterol glycosides and triterpenoid glycosides that are found in the bark, 
leaves, stems, roots and even flowers of a wide range of plant species. Saponin from the bark of 
5 the Quillaia saponaria Molina tree have been widely studied as adjuvants. Saponin can also be 
commercially obtained from Smilax ornata (sarsaprilla), Gypsophilla paniculata (brides veil), 
and Saponaria officianalis (soap root). Saponin adjuvant formulations include purified 
formulations, such as QS21, as well as lipid formulations, such as ISCOMs. 

Saponin compositions have been purified using High Performance Thin Layer Chromatography 
10 (HP-LC) and Reversed Phase High Performance Liquid Chromatography (RP-HPLC). Specific 
purified fractions using these techniques have been identified, including QS7, QS17, QS18, 
QS21, QH-A, QH-B and QH-C. Preferably, the saponin is QS21. A method of production of 
QS21 is disclosed in U.S. Patent No. 5,057,540. Saponin formulations may also comprise a 
sterol, such as cholesterol (see WO 96/33739). 

15 Combinations of saponins and cholesterols can be used to form unique particles called 
Immunostimulating Complexs (ISCOMs). ISCOMs typically also include a phospholipid such 
as phosphatidylethanolamine or phosphatidylcholine. Any known saponin can be used in 
ISCOMs. Preferably, the ISCOM includes one or more of Quil A, QHA and QHC. ISCOMs are 
further described in EP 0 109 942, WO 96/11711 and WO 96/33739. Optionally, the ISCOMS 

20 may be devoid of additional detergent. See ref. 4. 

A review of the development of saponin based adjuvants can be found at ref. 5. 

C. Virosomes and Virus Like Particles (VLPs) 

Virosomes and Virus Like Particles (VLPs) can also be used as adjuvants in the invention. 
These structures generally contain one or more proteins from a virus optionally combined or 

25 formulated with a phospholipid. They are generally non-pathogenic, non-replicating and 
generally do not contain any of the native viral genome. The viral proteins may be recombinantly 
produced or isolated from whole viruses. These viral proteins suitable for use in virosomes or 
VLPs include proteins derived from influenza virus (such as HA or NA), Hepatitis B virus (such 
as core or capsid proteins), Hepatitis E virus, measles virus, Sindbis virus, Rotavirus, Foot-and- 

30 Mouth Disease virus, Retrovirus, Norwalk virus, human Papilloma virus, HIV, RNA-phages, 
QB-phage (such as coat proteins), GA-phage, fr-phage, AP205 phage, and Ty (such as 
retrotransposon Ty protein pi). VLPs are discussed further in WO 03/024480, WO 03/024481, 
and Refs. 6, 7, 8 and 9. Virosomes are discussed further in, for example, Ref. 10 

D. Bacterial or Microbial Derivatives 
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Adjuvants suitable for use in the invention include bacterial or microbial derivatives such as: 

(1) Non-toxic derivatives of enterobacterial lipopolysaccharide (LPS) 

Such derivatives include Monophosphoryl lipid A (MPL) and 3-O-deacylated MPL (3dMPL). 
3dMPL is a mixture of 3 De-O-acylated monophosphoryl lipid A with 4, 5 or 6 acylated chains. 
A preferred "small particle" form of 3 De-O-acylated monophosphoryl lipid A is disclosed in EP 
0 689 454. Such "small particles" of 3dMPL are small enough to be sterile filtered through a 
0.22 micron membrane (see EP 0 689 454). Other non-toxic LPS derivatives include 
monophosphoryl lipid A mimics, such as aminoalkyl glucosaminide phosphate derivatives e.g. 
RC-529. SeeRef. 11. 

(2) Lipid A Derivatives 

Lipid A derivatives include derivatives of lipid A from Escherichia coli such as OM-174. OM- 
174 is described for example in Ref. 12 and 13. 

(3) Immunostimulatory oligonucleotides 

Immunostimulatory oligonucleotides suitable for use as adjuvants in the invention include 
nucleotide sequences containing a CpG motif (a sequence containing an unmethylated cytosine 
followed by guanosine and linked by a phosphate bond). Bacterial double stranded RNA or 
oligonucleotides containing palindromic or poly(dG) sequences have also been shown to be 
immunostimulatory. 

The CpG's can include nucleotide modifications/analogs such as phosphorothioate modifications 
and can be double-stranded or single-stranded. Optionally, the guanosine may be replaced with 
an analog such as 2'-deoxy-7-deazaguanosine. See ref. 14, WO 02/26757 and WO 99/62923 for 
examples of possible analog substitutions. The adjuvant effect of CpG oligonucleotides is further 
discussed in Refs. 15, 16, WO 98/40100, U.S. Patent No. 6,207,646, U.S. Patent No. 6,239,116, 
and U.S. Patent No. 6,429,199. 

The CpG sequence may be directed to TLR9, such as the motif GTCGTT or TTCGTT. See ref. 
17. The CpG sequence may be specific for inducing a Thl immune response, such as a CpG- A 
ODN, or it may be more specific for inducing a B cell response, such a CpG-B ODN. CpG-A 
and CpG-B ODNs are discussed in refs. 18, 19 and WO 01/95935. Preferably, the CpG is a CpG- 
A ODN. 

Preferably, the CpG oligonucleotide is constructed so that the 5 5 end is accessible for receptor 
recognition. Optionally, two CpG oligonucleotide sequences may be attached at their 3' ends to 
form "immunomers". See, for example, refs. 20, 21, 22 and WO 03/035836. 

(4) ADP-ribosylating toxins and detoxified derivatives thereof 
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Bacterial ADP-ribosylating toxins and detoxified derivatives thereof may be used as adjuvants in 
the invention. Preferably, the protein is derived from E. coli (i.e., E. coli heat labile enterotoxin 
"LT), cholera ("CT"), or pertussis ("PT"). The use of detoxified ADP-ribosylating toxins as 
mucosal adjuvants is described in WO 95/17211 and as parenteral adjuvants in WO 98/42375. 
The toxin or toxoid is preferably in the form of a holotoxin, comprising both A and B subunits. 
Preferably, the A subunit contains a detoxifying mutation; preferably the B subunit is not 
mutated. Preferably, the adjuvant is a detoxified LT mutant such as LT-K63, LT-R72, and 
LTR192G. The use of ADP-ribosylating toxins and detoxified derivaties thereof, particularly 
LT-K63 and LT-R72, as adjuvants can be found in Refs. 23, 24, 25, 26, 27, 28, 29 and 30 each 
of which is specifically incorporated by reference herein in their entirety. Numerical reference 
for amino acid substitutions is preferably based on the alignments of the A and B subunits of 
ADP-ribosylating toxins set forth in Domenighini et al., Mol. Microbiol (1995) 15(6):1165 - 
1 167, specifically incorporated herein by reference in its entirety. 

E. Human Immunomodulators 

Human immunomodulators suitable for use as adjuvants in the invention include cytokines, such 
as interleukins (e.g. IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (e.g. interferon-?), 
macrophage colony stimulating factor, and tumor necrosis factor. 

F. Bioadhesives and Mucoadhesives 

Bioadhesives and mucoadhesives may also be used as adjuvants in the invention. Suitable 
bioadhesives include esterified hyaluronic acid microspheres (Ref. 31) or mucoadhesives such as 
cross-linked derivatives of poly(acrylic acid), polyvinyl alcohol, polyvinyl pyrollidone, 
polysaccharides and carboxymethylcellulose. Chitosan and derivatives thereof may also be used 
as adjuvants in the invention. E.g., ref. 32. 

G. Microparticles 

Microparticles may also be used as adjuvants in the invention. Microparticles (i.e. a particle of 
-lOOnm to ~150um in diameter, more preferably ~200nm to ~30um in diameter, and most 
preferably ~500nm to ~10um in diameter) formed from materials that are biodegradable and 
non-toxic (e.g. a poly(a-hydroxy acid), a polyhydroxybutyric acid, a polyorthoester, a 
polyanhydride, a polycaprolactone, etc.), with poly(lactide-co-glycolide) are preferred, 
optionally treated to have a negatively- charged surface (e.g. with SDS) or a positively-charged 
surface (e.g. with a cationic detergent, such as CTAB). 

H. Liposomes 

Examples of liposome formulations suitable for use as adjuvants are described in U.S. Patent No. 
6,090,406, U.S. Patent No. 5,916,588, and EP 0 626 169. 
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I. Polyoxvethvlene ether and Polvoxvethvlene Ester Formulations 

Adjuvants suitable for use in the invention include polyoxyethylene ethers and polyoxyethylene 
esters. Ref. 33. Such formulations further include polyoxyethylene sorbitan ester surfactants in 
combination with an octoxynol (Ref. 34) as well as polyoxyethylene alkyl ethers or ester 
surfactants in combination with at least one additional non-ionic surfactant such as an octoxynol 
(Ref. 35). 

Preferred polyoxyethylene ethers are selected from the following group: polyoxyethylene-9- 
lauryl ether (laureth 9), polyoxyethylene-9-steoryl ether, polyoxytheylene-8-steoryl ether, 
polyoxyethylene-4-lauryl ether, polyoxyethylene-35-lauryl ether, and polyoxyethylene-23-lauryl 

ether. 

J. Polvphosnhazene (PCPP) 

PCPP formulations are described, for example, in Ref. 36 and 37. 

i 

K. Muramvl peptides 

Examples of muramyl peptides suitable for use as adjuvants in the invention include N-acetyl- 
muramyl-L-tln-eonyl-D-isoglutamine(thr«MDP) ? N-acetyl-nonnuramyl-L-alanyl-D-isoglutamine 
(nor-MDP), and N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(l '-2'-dipalmitoyl-5n- 
glycero-3-hydroxyphosphoryloxy)-ethylamine MTP-PE). 

L. Imidazoquinolone Compounds . 

Examples of imidazoquinolone compounds suitable for use adjuvants in the invention include 
Imiquamod and its homologues, described further in Ref. 38 and 39. 

The invention may also comprise combinations of aspects of one or more of the adjuvants 
identified above. For example, the following adjuvant compositions may be used in the 
invention: 

(1) a saponin and an oil-in-water emulsion (ref. 40); 

(2) a saponin (e.g.., QS21) + a non-toxic LPS derivative (e.g., 3dMPL) (see WO 
94/00153); 

(3) a saponin (e.g.., QS21) + a non-toxic LPS derivative (e.g., 3dMPL) + a 
cholesterol; 

(4) a saponin (e.g. QS21) + 3dMPL + IL-12 (optionally + a sterol) (Ref. 41); 
combinations of 3dMPL with, for example, QS21 and/or oil-in-water emulsions (Ref. 42); 
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(5) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic-block polymer 
L121, and thr-MDP, either microfluidized into a submicron emulsion or vortexed to generate a 
larger particle size emulsion. 

(6) Ribi™ adjuvant system (RAS), (Ribi Immunochem) containing 2% Squalene, 
0.2% Tween 80, and one or more bacterial cell wall components from the group consisting of 
monophosphorylipid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton (CWS), 
preferably MPL + CWS (Detox™); and 

(7) one or more mineral salts (such as an aluminum salt) + a non-toxic derivative of 
LPS (such as 3dPML). 

Aluminium salts and MF59 are preferred adjuvants for parenteral immunisation. Mutant bacterial 
toxins are preferred mucosal adjuvants. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ nucleic acid, 
pharmaceutical^ acceptable carrier, and adjuvant) typically will contain diluents, such as water, saline, 
glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also 
be prepared. The preparation also may be emulsified or encapsulated in liposomes for enhanced adjuvant 
effect, as discussed above under pharmaceutically acceptable carriers. 

Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, as 
needed. By "immunologically effective amount", it is meant that the administration of that amount to an 
individual, either in a single dose or as part of a series, is effective for treatment or prevention. This amount 
varies depending upon the health and physical condition of the individual to be treated, the taxonomic group 
of individual to be treated (eg. nonhuman primate, primate, etc.), the capacity of the individual's immune 
system to synthesize antibodies, the degree of protection desired, the formulation of the vaccine, the treating 
doctor's assessment of the medical situation, and other relevant factors. It is expected that the amount will 
fall in a relatively broad range that can be detennined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, either subcu- 
taneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). Additional formulations 
suitable for other modes of administration include oral and pulmonary formulations, suppositories, and 
transdermal applications. Dosage treatment may be a single dose schedule or a multiple dose schedule. The 
vaccine may be administered in conjunction with other immunoregulatory agents. 
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As an alternative to protein-based vaccines, DNA vaccination may be used [eg. Robinson & Torres (1997) 
Seminars in Immunol 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 15:617-648; later herein]. 
Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of the 
invention, to be delivered to the mammal for expression in the mammal, can be administered either locally 
or systemically. These constructs can utilize viral or non-viral vector approaches in in vivo or ex vivo 
modality. Expression of such coding sequence can be induced using endogenous mammalian or 
heterologous promoters. Expression of the coding sequence in vivo can be either constitutive or regulated. 
The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can also be an 
astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, picornavirus, poxvirus, 
or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 1:51-64; Kimura (1994) Human 
Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 6:185-193; and Kaplitt (1994) Nature 
Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector is 
employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for example, 
NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) polytropic retroviruses eg. MCF and 
MCF-MLV (see Kelly (1983) J. Virol 45:291), spumaviruses and lentiviruses. See RNA Tumor Viruses, 
Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For example, 
retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site from a Rous Sarcoma 
Virus, a packaging signal from a Murine Leukemia Virus, and an origin of second strand synthesis from an 
Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral vector 
particles by introducing them into appropriate packaging cell lines (see US patent 5,591,624). Retrovirus 
vectors can be constructed for site-specific integration into host cell DNA by incorporation of a chimeric 
integrase enzyme into the retroviral particle (see W096/37626). It is preferable that the recombinant viral 
vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known in the art, 
are readily prepared (see WO95/30763 and WO92/05266), and can be used to create producer cell lines 
(also termed vector cell lines or "VCLs") for the production of recombinant vector particles. Preferably, the 
packaging cell lines are made from human parent cells (eg. HT1080 cells) or mink parent cell lines, which 
eliminates inactivation in human serum. 
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Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian Leukosis Virus, 
Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing Virus, Murine Sarcoma 
Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly preferred Murine Leukemia 
Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 19:19-25), Abelson (ATCC No. 
VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol VR-590), Kirsten, Harvey Sarcoma Virus 
and Rauscher (ATCC No. VR-998) and Moloney Murine Leukemia Virus (ATCC No. VR-190). Such 
retroviruses may be obtained from depositories or collections such as the American Type Culture Collection 
("ATCC") in Rockville, Maryland or isolated from known sources using commonly available techniques. 
Exemplary known retroviral gene therapy vectors employable in this invention include those described in 
patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; WO89/05349, 
WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, W093/25234, WO93/11230, 
WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 5,219,740, US 4,405,712, US 4,861,719, US 
4,980,289, US 4,777,127, US 5,591,624. See also Vile (1993) Cancer Res 53:3860-3864; Vile (1993) 
Cancer Res 53:962-967; Ram (1993) Ccmcer Res 53 (1993) 83-88; Takamiya (1992) J Neurosci Res 
33:493-503; Baba (1993) JNeurosurg 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad 
Sci 81:6349; and Miller (1990) Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. See, for 
example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and WO93/07283, 
WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors employable in this 
invention include those described in the above referenced documents and in W094/12649, WO93/03769, 
W093/19191, W094/28938, W095/11984, WO95/00655, WO95/27071, W095/29993, W095/34671, 
WO96/05320, WO94/08026, WO94/11506, WO93/06223, W094/24299, WO95/14102, W095/24297, 
WO95/02697, W094/28152, W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and 
WO95/09654. Alternatively, administration of DNA linked to killed adenovirus as described in Curiel 
(1992) Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such vectors for use 
in this invention are the AAV-2 based vectors disclosed in Srivastava, WO93/09239. Most preferred AAV 
vectors comprise the two AAV inverted terminal repeats in which the native D-sequences are modified by 
substitution of nucleotides, such that at least 5 native nucleotides and up to 18 native nucleotides, preferably 
at least 10 native nucleotides up to 18 native nucleotides, most preferably 10 native nucleotides are retained 
and the remaining nucleotides of the D-sequence are deleted or replaced with non-native nucleotides. The 
native D-sequences of the AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in 
each AAV inverted terminal repeat (ie. there is one sequence at each end) which are not involved in HP 
formation. The non-native replacement nucleotide may be any nucleotide other than the nucleotide found in 
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the native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of such an 
AAV vector is psub201 (see Samulski (1987) Virol 61:3096). Another exemplary AAV vector is the 
Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US Patent 5,478,745. Still 
5 other vectors are those disclosed in Carter US Patent 4,797,368 and Muzyczka US Patent 5,139,941, 
Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a further example of an AAV vector 
employable in this invention is SSV9AFABTKneo, which contains the AFP enhancer and albumin 
promoter and directs expression predominantly in the liver. Its structure and construction are disclosed in Su 
(1996) Human Gene Therapy 7:463-470. Additional AAV gene therapy vectors are described in US 

10 5,354,678, US 5,173,414, US 5,139,941, and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred examples are 
herpes simplex virus vectors containing a sequence encoding a thymidine kinase polypeptide such as those 
disclosed in US 5,288,641 and EP0176170 (Roizman). Additional exemplary herpes simplex virus vectors 
include HFEM/ICP6-LacZ disclosed in WO95/04139 (Wistar Institute), pHSVlac described in Geller 

15 (1988) Science 241:1667-1669 and in WO90/09441 and WO92/07945, HSV Us3::pgC-lacZ described in 
Fink (1992) Human Gene Therapy 3:11-19 and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 
(Breakefield), and those deposited with the ATCC with accession numbers VR-977 and VR-260. 
Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. Preferred 
alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC VR-67; ATCC 

20 VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; ATCC VR-1246), 
Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC VR-1249; ATCC VR-532), 
and those described in US patents 5,091,309, 5,217,879, and WO92/10578. More particularly, those alpha 
virus vectors described in US Serial No. 08/405,627, filed March 15, 1995,W094/21792, WO92/10578, 
WO95/07994, US 5,091,309 and US 5,217,879 are employable. Such alpha viruses may be obtained from 

25 depositories or collections such as the ATCC in Rockville, Maryland or isolated from known sources using 
commonly available techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see 
USSN 08/679640). 

DNA vector systems such as eukaryotic layered expression systems are also useful for expressing the 
nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered expression 
30 systems. Preferably, the eukaryotic layered expression systems of the invention are derived from alphavirus 
vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J. Biol 
Standardization 1:115; rhinovirus, for example ATCC VR-1110 and those described in Arnold (1990) J 
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Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC VR-1 1 1 and 
ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; Flexner (1989) Ann 
NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 4,769,330 and WO89/01973; 
SV40 virus, for example ATCC VR-305 and those described in Mulligan (1979) Nature 277:108 and 
5 Madzak (1992) J Gen Virol 73:1533; influenza virus, for example ATCC VR-797 and recombinant 
influenza viruses made employing reverse genetics techniques as described in US 5,166,057 and in Enami 
(1990) Proc Natl Acad Sci 87:3802-3805; Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) 
Cell 59:110, (see also McMichael (1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature 
(1979) 277:108); human immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) 

10 J. Virol. 66:2731; measles virus, for example ATCC VR-67 and VR-1247 and those described in EP- 
0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC 
VR-1240; Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1 241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC VR-369 
and ATCC VR-1 243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for example ATCC 

15 VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu virus, for example 
ATCC VR-371; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example 
ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; 
Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; O'Nyong virus, 
Eastern encephalitis virus, for example ATCC VR-65 and ATCC VR-1242; Western encephalitis virus, for 

20 example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC VR-1252; and coronavirus, for 
example ATCC VR-740 and those described in Hamre (1966) Proc Soc Exp Biol Med 121:190. 
Delivery of the compositions of this invention into cells is not limited to the above mentioned viral vectors. 
Other delivery methods and media may be employed such as, for example, nucleic acid expression vectors, 
polycationic condensed DNA linked or unlinked to killed adenovirus alone, for example see US Serial No. 

25 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 3:147-154 ligand linked DNA, for 
example see Wu (1989) J Biol Chem 264:16985-16987, eucaryotic cell delivery vehicles cells, for example 
see US Serial No.08/240,030, filed May 9 5 1994, and US Serial No. 08/404,796, deposition of 
photopolymerized hydrogel materials, hand-held gene transfer particle gun, as described in US Patent 
5,149,655, ionizing radiation as described in US5,206,152 and in WO92/11033, nucleic charge 

30 neutralization or fusion with cell membranes. Additional approaches are described in Philip (1994) Mol Cell 
Biol 14:241 1-2418 and in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. Briefly, the 
sequence can be inserted into conventional vectors that contain conventional control sequences for high 
level expression, and then incubated with synthetic gene transfer molecules such as polymeric 
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DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting ligands such as 
asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 262:4429-4432, insulin as described in 
Hucked (1990) Biochem Pharmacol 40:253-263, galactose as described in Plank (1992) Bioconjugate Chem 
3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in WO 
90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex beads. DNA 
coated latex beads are efficiently transported into cells after endocytosis initiation by the beads. The method 
may be improved further by treatment of the beads to increase hydrophobicity and thereby facilitate 
disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US ,5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral delivery, the 
nucleic acid sequences encoding a polypeptide can be inserted into conventional vectors that contain 
conventional control sequences for high level expression, and then be incubated with synthetic gene transfer 
molecules such as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell 
targeting ligands such as asialoorosomucoid, insulin, galactose, lactose, or transferrin. Other delivery 
systems include the use of liposomes to encapsulate DNA comprising the gene under the control of a 
variety of tissue-specific or ubiquitously-active promoters. Further non-viral delivery suitable for use 
includes mechanical delivery systems such as the approach described in Woffendin et al (1994) Proc. Natl 
Acad. Set USA 91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of such 
can be delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for activating 
transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 and 
4,762,915; in WO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, Biochemistry, 
pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem Biophys Acta 600:1; Bayer 
(1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 149:119; Wang (1987) Proc Natl 
AcadSci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy vehicle, as 
the term is defined above. For purposes of the present invention, an effective dose will be from about 0.01 
mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
administered. 
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Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly to the 
subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression of recombinant 
proteins. The subjects to be treated can be mammals or birds. Also, human subjects can be treated. 
5 Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, 
intraperitoneally, intravenously or intramuscularly or delivered to the interstitial space of a tissue. The 
compositions can also be administered into a lesion. Other modes of administration include oral and 
pulmonary administration, suppositories, and transdermal or transcutaneous applications (eg. see 
WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or a 

10 multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known in the art 
and described in eg. W093/14778. Examples of cells useful in ex vivo applications include, for example, 
stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or tumor cells. 
Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished by the 

15 following procedures, for example, dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) 
in liposomes, and direct microinjection of the DNA into nuclei, all well known in the art. 
Polynucleotide and volvpeptide pharmaceutical compositions 
The terms "polynucleotide" and "nucleic acid", used interchangeably herein, 

20 In addition to the pharmaceutical^ acceptable carriers and salts described above, the following additional 
agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); transferrin; 
asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, granulocyte, 
25 macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating factor (G-CSF), 
macrophage colony stimulating factor (M-CSF), stem cell factor and erythropoietin. Viral antigens, such as 
envelope proteins, can also be used. Also, proteins from other invasive organisms, such as the 17 amino 
acid peptide from the circumsporozoite protein of Plasmodium falciparum known as RIL 

B. Hormones, Vitamins, etc. 

30 Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, thyroid 
hormone, or vitamins, folic acid. 

C. Polvalkvlenes. Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a preferred 
embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or polysaccharides 
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can be included. In a preferred embodiment of this aspect, the polysaccharide is dextran or DEAE-dextran. 
Also, chitosan and poly(lactide-co-glycolide) 
D .Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes prior to 

5 delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or entrap and 
retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary but will generally be 
around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the use of liposomes as carriers 
for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. Biophys. Acta. 1097:1-17; Straubinger 

10 (1983) Metk Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), anionic 
(negatively charged) and neutral preparations. Cationic liposomes have been shown to mediate intracellular 
delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Sci USA 84:7413-7416); mRNA (Malone 
(1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified transcription factors (Debs (1990) J. Biol. 

1 5 Chem. 265:10189-101 92), in functional form. 

Cationic liposomes are readily available. For example, 

N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium (DOTMA) liposomes are available under the 
trademark Lipofectin, from GIBCO BRL, Grand Island, NY. (See, also, Feigner supra). Other 
commercially available liposomes include transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). 

20 Other cationic liposomes can be prepared from readily available materials using techniques well known in 
the art. See, eg. Szoka (1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; WO90/11092 for a description of 
the synthesis of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 
Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 

25 phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. These 
materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate ratios. Methods 
for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), or large 
30 unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared using methods 
known in the art. See eg. Straubinger (1983) Metk Immunol 101:512-527; Szoka (1978) Proc. Natl Acad. 
Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 394:483; Wilson (1979) Cell 
17:77); Deamer & Bangham (1976) Biochim, Biophys. Acta 443:629; Ostro (1977) Biochem. Biophys. Res. 
Commun. 76:836; Fraley (1979) Proc. Natl Acad. Sci. USA 76:3348); Enoch & Strittmatter (1979) Proc. 
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Natl Acad. Sci. USA 76:145; Fraley (1980) J. Biol Chem. (1980) 255:10431; Szoka & Papahadjopoulos 

(1978) Proc. Natl Acad Sci USA 75:145; and Schaefer-Ridder (1982) Science 215:166. 
EXipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. Examples of 
lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, fragments, or 
fusions of these proteins can also be used. Also, modifications of naturally occurring lipoproteins can be 
used, such as acetylated LDL. These lipoproteins can target the delivery of polynucleotides to cells 
expressing lipoprotein receptors. Preferably, if lipoproteins are including with the polynucleotide to be 
delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are known as 
apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and identified. At least two of 
these contain several proteins, designated by Roman numerals, AI, All, AIV; CI, CII, CIIL 
A lipoprotein can comprise more than one apoprotein. For example, naturally occurring chylomicrons 
comprises of A, B, C & E, over time these lipoproteins lose A and acquire C & E. VLDL comprises A, B, C 
& E apoproteins, LDL comprises apoprotein B; and HDL comprises apoproteins A, C, & E. 
The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) Annu 
Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 261:12918; 
Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 
Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
phospholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of naturally 
occurring lipoproteins can be found, for example, in Metk Enzymol 128 (1986). The composition of the 
lipids are chosen to aid in conformation of the apoprotein for receptor binding activity. The composition of 
lipids can also be chosen to facilitate hydrophobic interaction and association with the polynucleotide 
binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. Such 
methods are described in Metk Enzymol (supra); Pitas (1980) J. Biochem. 255:5454-5460 and Mahey 

(1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or recombinant methods by 
expression of the apoprotein genes in a desired host cell. See, for example, Atkinson (1986) Annu Rev 
Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 443. Lipoproteins can also be 
purchased from commercial suppliers, such as Biomedical Techniologies, Inc., Stoughton, Massachusetts, 
USA. Further description of lipoproteins can be found in Zuckermann et al PCT/US97/14465. 
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F.Polvcationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are capable of 
neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired location. These agents 
have both in vitro, ex vivo, and in vivo applications. Polycationic agents can be used to deliver nucleic acids 
to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, DNA 
binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such as (X174, 
transcriptional factors also contain domains that bind DNA and therefore may be useful as nucleic aid 
condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, AP-1, AP-2, AP-3, CPF, 
Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that bind DNA sequences. 
Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the list 
above, to construct other polypeptide polycationic agents or to produce synthetic polycationic agents. 
Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when combined with 
polynucleotides/polypeptides. 
lmmunodiamostic Assays 

Streptococcus antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Streptococcus antibodies can be used to detect antigen levels). Immunoassays based on 
well defined, recombinant antigens can be developed to replace invasive diagnostics methods. Antibodies to 
Streptococcus proteins within biological samples, including for example, blood or serum samples, can be 
detected. Design of the immunoassays is subject to a great deal of variation, and a variety of these are 
known in the art. Protocols for the immunoassay may be based, for example, upon competition, or direct 
reaction, or sandwich type assays. Protocols may also, for example, use solid supports, or may be by 
immunoprecipitation. Most assays involve the use of labeled antibody or polypeptide; the labels may be, for 
example, fluorescent, chemiluminescent, radioactive, or dye molecules. Assays which amplify the signals 
from the probe are also known; examples of which are assays which utilize biotin and avidin, and enzyme- 
labeled and mediated immunoassays, such as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed by 
packaging the appropriate materials, including the compositions of the invention, in suitable containers, 
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along with the remaining reagents and materials (for example, suitable buffers, salt solutions, etc.) required 
for the conduct of the assay, as well as suitable set of assay instructions. 
Use of Polypeptides to Screen for Pevtide Analogs and Antagonists 

Polypeptides encoded by the instant polynucleotides and corresponding full length genes can be used to 
screen peptide libraries to identify binding partners, such as receptors, from within the library. Peptide 
libraries can be synthesized according to methods known in the art {e.g. Us patent 5,010,175; 
W091/17823). Agonists or antagonists of the polypeptides if the invention can be screened using any 
available method known in the art, such as signal transduction, antibody binding, receptor binding, 
mitogenic assays, chemotaxis assays, etc. The assay conditions ideally should resemble the conditions under 
which the native activity is exhibited in vivo, that is, under physiologic pH, temperature, and ionic strength. 
Suitable agonists or antagonists will exhibit strong inhibition or enhancement of the native activity at 
concentrations that do not cause toxic side effects in the subject. Agonists or antagonists that compete for 
binding to the native polypeptide can require concentrations equal to or greater than the native 
concentration, while inhibitors capable of binding irreversibly to the polypeptide can be added in 
concentrations on the order of the native concentration. 

Such screening and experimentation can lead to identification of a polypeptide binding partner, such as a 
receptor, encoded by a gene or a cDNA corresponding to a polynucleotide described herein, and at least one 
peptide agonist or antagonist of the binding partner. Such agonists and antagonists can be used to modulate, 
enhance, or inhibit receptor function in cells to which the receptor is native, or in cells that possess the 
receptor as a result of genetic engineering. Further, if the receptor shares biologically important 
characteristics with a known receptor, information about agonist/antagonist binding can facilitate 
development of improved agonists/antagonists of the known receptor. 
Identification of anti-bacterial agents 
Drug Screening Assays 

Of particular interest in the present invention is the identification of agents that have activity in modulating 
expression of one or more of the adhesion-specific genes described herein, so as to inhibit infection and/or 
disease. Of particular interest are screening assays for agents that have a low toxicity for human cells. 
The term "agent" as used herein describes any molecule with the capability of altering or mimicking the 
expression or physiological function of a gene product of a differentially expressed gene. Generally a 
plurality of assay mixtures are run in parallel with different agent concentrations to obtain a differential 
response to the various concentrations. Typically, one of these concentrations serves as a negative control 
i.e. at zero concentration or below the level of detection. 

Candidate agents encompass numerous chemical classes, including, but not limited to, organic molecules 
(e.g. small organic compounds having a molecular weight of more than 50 and less than about 2,500 
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daltons), peptides, antisense polynucleotides, and ribozymes, and the like. Candidate agents can comprise 
functional groups necessary for structural interaction with proteins, particularly hydrogen bonding, and 
typically include at least an amine, carbonyl, hydroxyl or carboxyl group, preferably at least two of the 
functional chemical groups. The candidate agents often comprise cyclical carbon or heterocyclic structures 
and/or aromatic or polyaromatic structures substituted with one or more of the above functional groups. 
Candidate agents are also found among biomolecules including, but not limited to: polynucleotides, 
peptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, structural analogs or 
combinations thereof. 

Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural 
compounds. For example, numerous means are available for random and directed synthesis of a wide 
variety of organic compounds and biomolecules, including expression of randomized oligonucleotides and 
oligopeptides. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and 
animal extracts are available or readily produced. Additionally, natural or synthetically produced libraries 
and compounds are readily modified through conventional chemical, physical and biochemical means, and 
may be used to produce combinatorial libraries. Known pharmacological agents may be subjected to 
directed or random chemical modifications, such as acylation, alkylation, esterification, amidification, etc. 
to produce structural analogs. 
Screening of Candidate Agents In Vitro 

A wide variety of in vitro assays may be used to screen candidate agents for the desired biological activity, 
including, but not limited to, labeled in vitro protein-protein binding assays, protein-DNA binding assays 
(e.g. to identify agents that affect expression), electrophoretic mobility shift assays, immunoassays for 
protein binding, and the like. For example, by providing for the production of large amounts of a 
differentially expressed polypeptide, one can identify ligands or substrates that bind to, modulate or mimic 
the action of the polypeptide. The purified polypeptide may also be used for determination of three- 
dimensional crystal structure, which can be used for modeling intermolecular interactions, transcriptional 
regulation, etc. 

The screening assay can be a binding assay, wherein one or more of the molecules may be joined to a label, 
and the label directly or indirectly provide a detectable signal. Various labels include radioisotopes, 
fluorescers, chemiluminescers, enzymes, specific binding molecules, particles, e.g. magnetic particles, and 
the like. Specific binding molecules include pairs, such as biotin and streptavidin, digoxin and antidigoxin 
etc. For the specific binding members, the complementary member would normally be labeled with a 
molecule that provides for detection, in accordance with known procedures. 

A variety of other reagents may be included in the screening assays described herein. Where the assay is a 
binding assay, these include reagents like salts, neutral proteins, e.g. albumin, detergents, etc. that are used 
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to facilitate optimal protein-protein binding, protein-DNA binding, and/or reduce non-specific or 
background interactions. Reagents that improve the efficiency of the assay, such as protease inhibitors, 
nuclease inhibitors, anti-microbial agents, etc. may be used. The mixture of components are added in any 
order that provides for the requisite binding. Incubations are performed at any suitable temperature, 
5 typically between 4 and 40°C. Incubation periods are selected for optimum activity, but may also be 
optimized to facilitate rapid high-throughput screening. Typically between 0.1 and 1 hours will be 
sufficient. 

Many mammalian genes have homologs in yeast and lower animals. The study of such homologs* 
physiological role and interactions with other proteins in vivo or in vitro can facilitate understanding of 
10 biological function. In addition to model systems based on genetic complementation, yeast has been shown 
to be a powerful tool for studying protein-protein interactions through the two hybrid system. 
Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen bonding. 
Typically, one sequence will be fixed to a solid support and the other will be free in solution. Then, the two 

15 sequences will be placed in contact with one another under conditions that favor hydrogen bonding. Factors 
that affect this bonding include: the type and volume of solvent; reaction temperature; time of hybridization; 
agitation; agents to block the non-specific attachment of the liquid phase sequence to the solid support 
(Denhardt's reagent or BLOTTO); concentration of the sequences; use of compounds to increase the rate of 
association of sequences (dextran sulfate or polyethylene glycol); and the stringency of the washing 

20 conditions following hybridization. See Sambrook et al [supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar sequences 
over sequences that differ. For example, the combination of temperature and salt concentration should be 
chosen that is approximately 120 to 200°C below the calculated Tm of the hybrid under study. The 
temperature and salt conditions can often be determined empirically in preliminary experiments in which 

25 samples of genomic DNA immobilized on filters are hybridized to the sequence of interest and then washed 
under conditions of different stringencies. See Sambrook et al at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the complexity of the DNA 
being blotted and (2) the homology between the probe and the sequences being detected. The total amount 
of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to lfxg for a plasmid or phage digest 
30 to 10" 9 to 10~ 8 g for a single copy gene in a highly complex eukaryotic genome. For lower complexity 
polynucleotides, substantially shorter blotting, hybridization, and exposure times, a smaller amount of 
starting polynucleotides, and lower specific activity of probes can be used. For example, a single-copy yeast 
gene can be detected with an exposure time of only 1 hour starting with 1 |xg of yeast DNA, blotting for two 
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hours, and hybridizing for 4-8 hours with a probe of 10 cpm/jug. For a single-copy mammalian gene a 
conservative approach would start with 10 jug of DNA, blot overnight, and hybridize overnight in the 
presence of 10% dextran sulfate using a probe of greater than 10 8 cprn/|ug, resulting in an exposure time of 
~24 hours. 

5 Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe and the 
fragment of interest, and consequently, the appropriate conditions for hybridization and washing. In many 

■ 

cases the probe is not 100% homologous to the fragment. Other commonly encountered variables include 
the length and total G+C content of the hybridizing sequences and the ionic strength and formamide content 
of the hybridization buffer. The effects of all of these factors can be approximated by a single equation: 
10 Tm= 81 + 16.6(logi 0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs (slightly 
modified from Meinkoth & Wahl (1984) ^mz/. Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
conveniently altered. The temperature of the hybridization and washes and the salt concentration during the 

15 washes are the simplest to adjust. As the temperature of the hybridization increases (fe. stringency), it 
becomes less likely for hybridization to occur between strands that are nonhomologous, and as a result, 
background decreases. If the radiolabeled probe is not completely homologous with the immobilized 
fragment (as is frequently the case in gene family and interspecies hybridization experiments), the 
hybridization temperature must be reduced, and background will increase. The temperature of the washes 

20 affects the intensity of the hybridizing band and the degree of background in a similar manner. The 
stringency of the washes is also increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for a probe 
with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, and 32°C for 
85% to 90% homology. For lower homologies, formamide content should be lowered and temperature 

25 adjusted accordingly, using the equation above. If the homology between the probe and the target fragment 
are not known, the simplest approach is to start with both hybridization and wash conditions which are 
nonstringent. If non-specific bands or high background are observed after autoradiography, the filter can be 
washed at high stringency and reexposed. If the time required for exposure makes this approach impractical, 
several hybridization and/or washing stringencies should be tested in parallel. 

30 Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes 
according to the invention can determine the presence of cDNA or mRNA. A probe is said to "hybridize" 
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with a sequence of the invention if it can form a duplex or double stranded complex, which is stable enough 
to be detected. 

The nucleic acid probes will hybridize to the Streptococcus nucleotide sequences of the invention (including 
both sense and antisense strands). Though many different nucleotide sequences will encode the amino acid 
5 sequence, the native Streptococcal sequence is preferred because it is the actual sequence present in cells. 
mRNA represents a coding sequence and so a probe should be complementary to the coding sequence; 
single-stranded cDNA is complementary to mRNA, and so a cDNA probe should be complementary to the 
non-coding sequence. 

The probe sequence need not be identical to the Streptococcal sequence (or its complement) — some 
10 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe can 
form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can include 
additional nucleotides to stabilize the formed duplex. Additional Streptococcus sequence may also be 
helpful as a label to detect the formed duplex. For example, a non-complementary nucleotide sequence may 
be attached to the 5 ! end of the probe, with the remainder of the probe sequence being complementary to a 
15 Streptococcus sequence. Alternatively, non-complementary bases or longer sequences can be interspersed 
into the probe, provided that the probe sequence has sufficient complementarity with the a Streptococcus 
sequence in order to hybridize therewith and thereby form a duplex which can be detected. 
The exact length and sequence of the probe will depend on the hybridization conditions (e.g. temperature, 
salt condition etc.). For example, for diagnostic applications, depending on the complexity of the analyte 
20 sequence, the nucleic acid probe typically contains at least 10-20 nucleotides, preferably 15-25, and more 
preferably at least 30 nucleotides, although it may be shorter than this. Short primers generally require 
cooler temperatures to form sufficiently stable hybrid complexes with the template. 

Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al [J. Am. 
Chem. Soc. (1981) 103:3185], or according to Urdea et al [Proc. Natl Acad. Set USA (1983) 80: 7461], or 

25 using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, DNA or 
RNA are appropriate. For other applications, modifications may be incorporated eg. backbone 
modifications, such as phosphorothioates or methylphosphonates, can be used to increase in vivo half-life, 
alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer (1995) Curr Opin Biotechnol 

30 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as peptide nucleic acids may also be used 
[eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et al. (1993) TIBTECH 11:3 84-3 86] . 
Alternatively, the polymerase chain reaction (PGR) is another well-known means for detecting small 
amounts of target nucleic acid. The assay is described in Mullis et al. [Meth. Enzymol (1987) 155:335-350] 
& US patents 4,683,195 & 4,683,202. Two "primer" nucleotides hybridize with the target nucleic acids and 
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are used to prime the reaction. The primers can comprise sequence that does not hybridize to the sequence 
of the amplification target (or its complement) to aid with duplex stability or, for example, to incorporate a 
convenient restriction site. Typically, such sequence will flank the desired Streptococcus sequence. 
A thermostable polymerase creates copies of target nucleic acids from the primers using the original target 
5 nucleic acids as a template. After a threshold amount of target nucleic acids are generated by the 
polymerase, they can be detected by more traditional methods, such as Southern blots. When using the 
Southern blot method, the labelled probe will hybridize to the Streptococcus sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook et 
al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified and 
10 separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid support, such 
as nitrocellulose. The solid support is exposed to a labelled probe and then washed to remove any 
unhybridized probe. Next, the duplexes containing the labeled probe are detected. Typically, the probe is 
labelled with a radioactive moiety. 
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SEQUENCE LISTING 



SEQ ID NO. 1301: SA60466 FROM THE 2603V/R GBS STRAIN 

CTCCTGCCCCTG C AAT GGC AGT T AGAC C C AT AGGT T TAT T T T TAT AT T T T AAT G C CT G C AT AAG AT GAAG GAT AT T AAT AAT T C CT 
GAGCAGGCATAAGGGTGTCCGT AAGCTAATGT CCCTC CAAAAAT ATTGAATTTTT CT CT CT CTT CAGGATAAT AATGATT AAAT AG 
AGCAT C AAT CG C T G C AAAT GGT T CAT T C CAT T C AAT T G CAT CAT AAT C CG AT AT T T T AGT AT G AGT T T C T GT T AAT AGT T T T T C CG 
TAGCCGTGTGAACCAATTCTGGACTAAGCTTGGGATCTCCTGCTACTTCTACAATGTGAACAATCCGGAATTCTGTTTTCTGACTC 
TGAAGCGTTAGAAATGCAGCAGCATCGTGCATTAAACAAACATTTCCAATAGTGAGCAAAGGTGAATTTTCCATCAATCTTGGTAA 
TTTTTGAAAAAATGTTtCTTTTaGTTTTCTAACGCCTTGATCTCGCATCCCTTCCATTGGTAAGATTACyTCTTCTAAATAGCCAC 
C T T GT T T AGC T GT T AAGGC G C GT T T AT GG C T C AAG AAT G C C AAT T TAT C T AAC AT TTCTCTTC T AAAa C CAT AT T T T T GAC AGACT 
CTCTGGGCCCCTTCTAACATTACAGTTTCAGCATAAGAGTCAGGAGAAAACTGAGCAACTGTATATTCTCCGTTACGATTATCTTC 
TTTAGCATAACGTCTCATAGGTTGAAGAGAACTACTTTCAATCCCCCCAACAAGAACTTTTTCATTAATACCGGTACTGATTTTTA 
GAT AAC C AAAAAAC AAGG C AG AAC T T GAT GAAG C AC ACT G CAT AT C AAT C GT T T GT AC T G G AAT AT AGGAT T CAT AAT C AG AAAAA 
AGAGTCATCAAACGACCAATATTGCCCCCAGTACCAACTGTGTTCCCACAAATAATACTATCAATGTTAGATTCTGATTCTATTTT 
TTTTATTTGATTTAAAAGGTGTGCTCCTAAAAGTTCTGGACGGTA&GTTTAAATTGCTT 

SEQ ID NO. 1302: SAG0466 FROM THE M732 GBS TYPE III STRAIN 

TCGGTATAA^GGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATAGAATCA 
GAATCTAATATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTA 
TGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTG 
CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGT 
AACGGAGAA.TATACCGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGAAGGGGCACAAAGAGTCTGTCAAAR. 
ATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGAGCCATAAACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAG 
AAG AG GT AAT CT T AC C AAT G GAAGG G AT G CG AGAT C AAGG C GT T AG AAAAC T AAAAG AAG CAT T T T T T CAAAAAT T AC C AAG AT T G 
ATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAAC 
AGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTAT 
T AAC AG AAAC T C AT ACT AAAAT AT C GG AT TAT GAT G C AAT T G AAT G GAAT G AAC CAT T T GC AG C GAT T GAT G CT T T AT T T AAT CAT 
TATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTA 

SEQ ID NO. 1303: SAG0466 FROM THE 090 GBS TYPE la STRAIN 

TTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAA 
CGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTGCCGGTATTAATGAAAAAGTTCTT 
GTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGGAGAATATACCGTTGCTCA 
GTTTTCTCCTGACTCTTAkGCTGAAACTGTAATGtTAGAAGGGGCACAAAGAGTCTGTCAAAAATATGGTTTtAGAAGAGAAATGT 
TAGATAAATTGGCATTCTTGAGCCATAAACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAATGGAA 
GGGATGCGAGATCAAGGCGTTAGAAAACTAAAAGAAGCATTTTTTCAAAAATTACCAAGATTGATGGrAAATTCACCTTTGCTCAC 
TATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTwACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTG 
TAG AAG TAG C AGGAGAT C C C AAG CT T AGT C C AGAAT T G GT T C AC AC G GCT AC GG AAAAACT AT T AAC AG AAAC T CAT AC T AAAAT A 
T C GG AT TAT GAT GC AAT T G AAT GG AAT G AACC AT T T GC AG C GAT T GAT GCT T T AT T T AAT CAT TAT TAT C CT GAAGAGAG AGAAAA 
ATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGG 

SEQ ID NO. 1304: SAG0466 FROM THE COH1 GBS TYPE la STRAIN 

ATCGGTATAAAAGGGAAGCAATTTAAAATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATAGAATCA 
GAATCTAATATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTA 
TGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGGTATCTAAAAA 

SEQ ID NO. 1305 : SAG04 66 FROM THE CJB GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

TTTTCAAAAATTACCAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 
TAACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTT 
CACACGGCTACGGAAAAACTATTAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGC 
GATTGATGCTTTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTT 
AATGCCTGCTCAGGAATTATTAATATCC 

SEQ ID NO. 1306: sag04 66 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATATAACCAGA 
ATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATG 
AATCCTATATTC 

SEQ ID NO. 1307: SAG04 66 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 

CAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGT 
CAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGA 
AAAAC TAT T AAC AG AAAC T CAT AC T AAAAT AT C G GAT TAT GAT G C AAT T GAAT G GAAT G AAC CAT T T G C AG C GAT T GAT G CT T T AT 
TTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGA 
AT T AT T AAT AT C CT T CAT C T TAT G C AG G C AT T AAAAT AT AAAAAT AAA.C C T AT GG G C C T AACT G C CAT T G C AGGG G C A 

SEQ ID NO. 1308: SAG0466 FROM THE 18RS21 GBS TYPE II STRAIN 
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SEQUENCE LISTING 



CCTTAACAGTTAAACAAGGTGGCTATTTAGA?\GAGGTAATCTTACCAATGGAAGGGATGCGAGATCAAGGCGTTAGAAAACTAAAA 
G AAAC AT T T T T T C AAAAAT T AC C AAG AT T GAT G GAAAAT T C AC CT T T GC T C AC TAT T G G AAAT GT T T GT T T AAT GC ACGAT GCT G C 
T GCAT T T CT AACG CT T C AGAGT C AG AAAAC AG AAT T C CG G AT T GT T C AC AT T GT AGAAGT AG C AGG AG AT CCC AAGC T T AGT C C AG 
AAT T GGT T C AC ACG G CT AC GGAAAAACT AT T AAC AGAAACT C AT ACT AAAAT AT C G GAT T ATG AT G C AAT T GAAT GGAAT GAAC C A 
TTTGCAGCGATTGATGCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGACATTAGCTTACGG 
ACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTG 
CCATTGCAGGGGCAG 

SEQ ID NO. 1309: SAG0466 FROM THE 18RS21 GBS TYPE II STRAIN 

TCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCAAATAAAAAAAATAGAATCA 
GAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTA 
TGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTA 
CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTATGCTAAAGAAGATAATCGT ' 
AACGGAGAATATACAGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGAAGGGGCCCAGAGAGTCTGTCAAAA 
AT AT GGT T T TAG AAG AG AAAT GT TAG AT AAAT T G G CAT T CT T GAG C CAT AAAC G CG CC T T AAC AG C T AAAC A 

SEQ ID NO. 1310: SAG0466 PROM THE H36b GBS TYPE lb STRAIN 

TTTGGGCTACGAACACCTATCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCA 
AATAAAAAAAATAGAATCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGA 
TGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTT 
GGTTATCTAAAAATCAGTACCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTA 
TGCTAAAGAAGATAATCGTAACGGAGAATATACAGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGAAGGGG 
CCC 

SEQ ID NO. 1311: SAG04 66 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

GAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGA 
AT T C C GG AT T G T T C AC AT T GT AG AAGT AG C AGG AG AT C C C AAGCT TAG T C C AG AAT T G GT T C AC AC GG C T AC G G AAAAAC T AT T AA 
CAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTCTATTTAATCATTAT 
TAT C C T GAAGAG AG AG AAAAAT T C AAT AT T T T T G GAG G G AC AT T AGC TT AC G G AC AC C CT TAT G C CTG C T C AG GAATT AT T AAT AT 
C CT T CAT C T TAT G C AGG CAT T AAAAT AT AAAAAT AAAC C T AT GGGT C T AAC T G C CAT T G C AGGGG C AG G A 

SEQ ID NO. 1312: SAGO 4 66 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGAT 
T GT T C AC AT T GT AG AAGT AGC AGG AG AT C C C AAG CT T AG T C C AG AAT T GGT T C AC ACG G C T AC GG AAAAAC TAT T AAC AG AAAC T C 
AT AC T AAAAT AT C G GAT TAT GAT G C AAT T GAAT G GAAT GAAC CAT T T G C AG C GAT T GAT GCT T T AT T T AAT CAT TAT TAT C C T G AA 
GAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCT 
TATGCAGGCATTAAAATATAAAAATAAACCTATGGGTTCTAACTGC 

SEQ ID NO. 1313: SAG0466 FROM THE M781 GBS TYPE III STRAIN 

GCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATAGAATCAGAATCTAATATTGATA 
GTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCA 
GTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTGCCGGTATTAATGAAAA 
AGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGGAGAATATACCG 
TTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGA 

SEQ ID NO 1314: SAGO 4 66 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGAT 
TGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAACTC 
ATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTCTATTTAATCATTATTATCCTGAA 
GAGAGAGAAAAATTCAATATTTTTGGAGGGACATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCT 
TATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTGCCATTGCAGGGGC 

SEQ ID NO. 1315: SAG0466 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE COMPLEMENT 

GCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTC 
ACAT TGT AGAAGT AGC AGGAGAT CCC AAGCTT AGT CCAGAATTGGTTC AC ACGGCTACGGAAAAACT AT TAACAGAAACT CAT ACT 
AAAAT AT C GG AT T AT GAT G C AAT T GAAT GGAAT GAAC CAT T T GC AG C GAT T GAT G C T C T AT T T AAT CAT TAT TAT C C T GAAGAG AG 
AGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGC 
AGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTGCCATTGCAGGGGCAGGA 

SEQ ID NO. 1316: SAG0466 FROM THE JM9130013 GBS TYPE VIII STRAIN 

T T T G G GC T AC GAAC AC C TAT C G GT AT AAAAG GG AAG C AAT T T AAAC AT T AC C G T C C AGAACT T T T AGG AG C AC AC CT T T T AAAT C A 
AATAAAAAAAATAGAATCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGA 
TGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTT 
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GGTTATCTAAAAATCAGTACCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTA 
T G CT AAAGAAGAT AAT C GT AACGG AGAAT AT A 

SEQ ID NO. 1401: SAG0471 PROM THE 18RS21 GBS TYPE II STRAIN 

T T AAAT T T GGT AT CTT G ACGC T T GAGGG AG AAGT AC AAGAAAAAT GGGC AAT T GAG AC C AAT ACT T T AGAAAAC GGAAG AC AT AT C 
GTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTC 
T C C AGG AG CT GT T G AT AGAAC T AGT AAAAC AG T AAC AGGT GCT T T T AAT C T AAAT T GG G C T GAT AC T C AAG AAGT AGGT T CAGT T A 
TTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCC 
AATAATCCCGACGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGC 
AG GAG C AGGT GGAG AAAT T G G GC AT AT GAT T G T T GAT C C AG AAAAT G G ATT T ACGT G CAC AT G T GGT AAC AAAGG C T G C C T T GAGA 
CAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAGGGTTCGTCTGCCATTAAAGCAGCGATT 
GACACCGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGT 
ATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAG 
CAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAAAATTAAGAT 

SEQ ID NO. 1402: SAG0471 FROM THE 090 GBS TYPE la STRAIN 

CGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTT 
CTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTT 
ATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGC 
CAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTG 
CAGGAGCAGGTGGAGAAATTGGGCATATGATTGTTGATCCAGAKAATGGATTTACGTGCACATGTGGTAACAAAGGCTGTCTTGAG 
ACAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAAGGTTCGTCTGCCATTAAAGCAGCGAT 
TGACAACGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTG 
TATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCA 
G C AGGT GAAT T T T T AC G T AGT CG C G T T GAG AAAT AC T T T GT CAC AT TTG 

SEQ ID NO. 1403: SAG0471 FROM THE COH1 GBS TYPE la STRAIN 

ACAAGAAAAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTT 
TGAGC CT CT ATGGAT T AACAAAAGATGAC T T T CT CGGT ATCGGT AT GGGTT CT CCAGGAGCT GT TGAT AGAACT AGTAAAAC AGT A 
AC AG GT G CT T T T AAT CT AAAT T G GG CT GAT AC T C AAG A 

SEQ ID NO. 1404: SAG0471 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

TTGGTATCTTGACGCTTGAGGAGAAGTACAAGAAAAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTG 
ATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGT ATCGGT ATGGGGTCTCCAGGA 
G CTGT T GAT AGAACT AGT AAAAC 

SEQ ID NO. 1405: SAG0471 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

CACCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGT 
CGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTA 

SEQ ID NO. 1406: SAG0471 FROM THE 2603V/R GBS TYPE V STRAIN 

GGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTAT 
GGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTG 

SEQ ID NO. 1407: SAG0471 FROM THE H36b GBS TYPE lb STRAIN 

GGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATG 
GATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTTT 
AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAA 
TGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGACGTTGTTTTCGTAACC 

SEQ ID NO. 1408: SAG0471 FROM THE H36 GBS TYPE lb STRAIN (REVERSE COMPLEMENT ) 

GAG AC AGT T GC AT C AG C GAC AG GT GT T G T TAG AGT AG C AC G T C AAC T C G C AG AAC AAT AT GAG GGTTCGTCTGC CAT T AAAG C AG C 
GAT T G AC AAC G GT G AT AC T GT T AC AAGT AAAG AT AT T T T TAT AG C AG C AG AAG AT GGG GAT AAAT TTG C T AAT TCTGTTGTT G AAC 
GTGTATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCA 
GCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACA 

SEQ ID NO. 1409: SAG0471 FROM THE M732 GBS TYPE III STRAIN 

ACAAGAAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTG 
AG C C T CT AT GG AT T AAC AAAAG AT G ACT T T C T C GGT AT C G GT AT GGGTTCTC C AG G AG CT G T T G AT AG AACT AG T AAAAC AGT AAC 
AGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATA 
ACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACA 
GGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAAGAGCAGGTGGAGAAATTGGGCATATGATT 

SEQ ID NO. 1410: SAG0471 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 
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CAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTiUVAAAGTCAACTAAAATT 
AAGATTGCTGAACTAGGTAATGAT 

SEQ ID NO. 1411: SAG0471 FROM THE M781 GBS TYPE XII STRAIN 

AG AAGT AC AAGAAAAT GGG C AAT T GAG AC CAT AC T T AG AAAACG GAAGAC AT AT C GT T T CTGAT AT CGT T G AAT CT C T C AAAC AT C 
GTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACA 
GTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTAT 
TGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCG 

GAACAGGAGTA 

SEQ ID NO. 1412: SAG0471 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTA 
CCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAAT 
TTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAA 

SEQ ID NO. 1413: SAG0471 FROM THE 090 GBS TYPE la STRAIN 

AAATTTGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTC 
TGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAG 
GAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAA 
AAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAA 

T C C CG AC GT T GT T T T C GT AAC C CT C G G AAC AG G AGT AG GT GG AG G 

SEQ ID NO. 1414: SAG0471 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGT 
TACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGA 
AT T T T T ACGT AGT CG C GT T G AG AAAT ACT T T AT C AC AT TTGCTTTCC C AC AAGT T AAAAAG T C AAC T AAAAT T AAGAT T G 

SEQ ID NO. 1415: SAG0471 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGAAATTGGGCATATGATTGTTGATCCAGAAAATGGATT 
TACGTGCACATGTGGTAACAAAGGCTGCCTTGAGACAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAAC 
AATATGAGGGTTCGTCTGCCATTAAAiSCAGCGATTGACCACGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGAT 
GGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCC 
TGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTT 

T C CC AC AAGTT AAAAAGT C AACT AA 

SEQ ID NO. 1416: SAG0471 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

TGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTCTGAT 
ATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGC 
TGTTGATAGAACTAGTAAAACAGTCACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAG 

AAGCTGGAATTCCATTTTTTATTG 

SEQ ID NO. 1417: SAG0471 FROM THE 2603V/R TYPE V GBS STRAIN (REVERSE COMPLEMENT) 

AGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTC 

GCGTTGAGAAATACTTTGTCACATTTGTTTTCCCACAAGGT 

SEQ ID NO. 1501: SAG0492 FROM THE 1169NT1 GBS NONT Y PE ABLE STRAIN 

T G AC T T G GAT AT T CAT C AAGG AG AAGT GGT GG T TAT TAT TGGCCCTTCTGGCTCT G GT AAGT C AAC AT T T T T AAG AAC AAT G AAT C 
T C T T G GAAGT AC C AAC AAAGGGAAC AGT G ACT T T T G AAGGAAT T G AT AT AAC AG AC AAAAAAAAT GAT AT T T T T AAAAT G C GC G AA 
AAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAA 
GGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAG 
CTAGCTTATCTGGAGGACAACAACAACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCT 
ACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGT 
CACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGCATTATTGTGAGCAAGGGACCCCTAA 

G GAAGT AT 

SEQ ID NO. 1502: SAG0492 FROM THE 18RS21 GBS TYPE II STRAIN 

TTGGGAAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGT 
AAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATTGATATAACAGACAA 

AAAG AAT GAT AT T T T T AAAAT G CG CG AAAAAAT GGG CAT G G T T T T T C AAC AGT T C AAT CT AT T T C C C AAT AT G ACT GT AC TAG AAA 
ATATTACTTTATCACCTATTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGA 
CTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAA 
TCCTCATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
CTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGACGCA 

G AAAT TAT 
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SEQ ID NO. 1503: SAG0492 FROM THE 2603V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

AAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTC 
AAC ATTTTT AAGAACAATGAAT CT CTT GGAAGT ACCAACAAAGGGAACAGT GACTTTTGAAGGGAT TGAT ATAACAGACAAAAAGA 
ATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATT 
AC TT TAT C AC C T AT TAAG AC AAAGG GG CT T T C T AAT CTT GAT G CT C AGAC AAAAG CAT AT GAG C T AC TT G AAAAAGT T GG AC T C AA 
AG AGAAGGCT AAT AC T TAT C CAG CT AG CT TAT C T GG AG G AC AACAAC AACGAAT T G CT AT T G C AAG AG G T CT T G C AAT G AAT C C T G 
ATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAA 
TCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGAAT 
TAT T GT T GAG C AAGGGG C C C 

SEQ ID NO. 1504: SAG0492 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATT 
T T T AAG AAC AAT GAAT CT C T T GGAAGT AC C AAC AAAG GG AAC AG T GAC T T T T G AAG GG AT T GAT AT AAC AG AC AAAAAGAAT GAT A 
TTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTA 
T C AC C TAT T AAG AC AAAGGG AC T T T C TAAG CTT GAT G C T CAG AC AAAAG C AT ACG AG CT AC T T G AAAAAGT T G G ACT C AAAG AG AA 
GGCTAATGCTTATCCAGCAAGCTTATCTGGAGGACAACAACAACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCC 
TTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGT 
ATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGGATTATTGT 
TGAGCAAGGGACCCCTAAGAAAGTAT 

SEQ ID NO. 1505: SAG0492 FROM THE 090 GBS TYPE la STRAIN 

TGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACA 
GTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTT 
CAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGA 
CAAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCTAGCTTATCTGGAGGGCAACAACAA 
CGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGT 
AGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTG 
AAGTAGCGGATCGTGTCATTTTTATGGATGCAGGCATTATTGTTgAsCAAGGGACCCCTAAGGAAGTA 

SEQ ID NO. 1506: SAG0492 FROM THE A909 GBS TYPE la STRAIN 

CAATACAAGGACTTCATAAZ\AGTTTTGGGAAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTT 
ATT ATT GGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAAT GAAT CT CTT GGAAGT AC CAACAAAGGGAACAGTGACTTT 
TGAAGGGATTG AT AT AAC AGAC AAAAAGAAT GAT AT TTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTC AAC AGTTC AAT CT AT 
TTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACAAAAGCA 
TATGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGC 
TATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAG 
TCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCG 
GATCGTGTCATTTTTATGGATGCAGGAATTATTGTgAGCAAGGGGCCCCTAAGGAAGTATTTGAGCAGACAAAAGAAATCCGCACA 
AGAGATTTCTT 

SEQ ID NO. 1507: SAG0492 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCT 
CTT GGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATT GAT AT AAC AGAC AAAAAGAAT GAT ATTTTTAAAATGCGCGAAA 
AAAT G GG CAT GG T T T T T C AAC AGT T C AAT CT AT T T C C C AAT AT GAC T GT AC T AG AAAAT AT T AC T T TAT C AC CT AT TAAG AC AAAG 
GG ACT TTCTAAGCTT GAT GCT CAG AC AAAAGC AT ACG AGCTACTTG AAAAAGT TGG AC TC AAAG AGAAGGCT AATGCTT AT CCAGC 
TAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTA 
CTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTC 
ACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCTTTTTATGGATGCGGGAATTATTGTGAGCAAGGGACC 

SEQ ID NO. 1508: SAGO 4 92 FROM THE H36b GBS TYPE lb STRAIN 

ATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACA 
T TTTT AAGAACAATGAAT CT CTT GGAAGT AC C AAC AAAGGG AAC AGTGACTT T T GAAGGGATT GAT AT AAC AGAC AAAAAGAAT GA 
TAT T T T T AAAAT G CG CG AAAAAAT GGG C AT GG T T T T T C AAC AG T T C AAT C T AT T T C C C AAT AT G ACT GT AC TAG AAAAT AT T AC T T 
TAT C AC C T AT TAAG AC AAAG G GG CT T T CT AAG CTT GAT G CT CAG AC AAAAG CAT AT G AGC T ACT T G AAAAAGT T GG ACT C AAAG AG 
AAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGT 
CCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTG 
GTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCASGAATTATT 
GTTGAGCAAGGGGCCCCTAAGGAAGTAT 

SEQ ID NO. 1509: SAG0492 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTT 
TAAG AAC AAT GAAT C T CT T GGAAGT AC C AAC AAAG G G AAC AG T GAC TTTT G AAG GG AT T GAT AT AAC AG AC AAAAAGAAT GAT AT T 
T T T AAAAT GC G C G AAAAAAT G GG C AT G GT T T T T C AAC AGT T C AAT CT AT T T C C C AAT AT GAC T GT ACT AG AAAAT AT T AC T T TAT C 
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ACCTATTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGG 

CTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTT 

CTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTAT 

GACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGAATTATTGTTG 
AGC AAGGGG C C CC T AAG G AAGT AT T TAG C AAAAC AAAAGAAAT 

SEQ ID NO. 1510: SA60492 FROM THE M732 GBS TYPE III STRAIN 

GGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAG 
TGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTC 
AAT CT AT T T C C C AAT AT G ACT GT AC T AGAAAAT AT TACT T TAT C AC C TAT T AAG AC AAAG GGACT T T C T AAG C T T GAT G CT C AG AC 

AAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCAAGCTTATCTGG 
SEQ ID NO. 1511: SAG0492 FROM THE COH1 GBS TYPE la STRAIN 

ATTGACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAA 
TCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCG 
AAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACA 
AAGG G AC T T T C T AAG CT T GAT G C T C AG AC AAAAG C AT ACGAGC T AC T T GAAAAAGT T GGACT C AAAG AGAAG GCT AAT G CT T AT C C 
AG C AAGC T TAT C T GG 

SEQ ID NO. 1601: SAG0767 FROM THE M781 GBS TYPE III STRAIN 

TGGTCGCTCTGTCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAA 
AC T TAT T T TAT C ACG C AAGT AG GT C AAT T T AT T AAAAC AC AAG AAT T T GAT GAAAT G CC AT C T T C AG AT G AAAAG T T AAT GAC AAA 

CCAAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAA 
TGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTT CAAGCGTGGCT 
AT GG AT AAAAT T AC AAC AAAAC AAGT C C T T G C AAC AGT AG GT GT AC C T C AGG T T GC AT AT C AAAC T TAT T T T G AG GGT G AT GAT T T 

GGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGGGGTCATCAGTAGGTATTT 

CAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTG 

ACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTTGTTAAAGACGTCGATTT 

CTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGC 

GTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATC 

TTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAATATGGGGCTAACTTATAG 
TGATTTGATTG 

SEQ ID NO. 1602: SAG0767 FROM THE 090 GBS TYPE la STRAIN 

AAACCGGGCATTGTATTCAGTTCGTTTAAGAAGACTTGTCCATCTTTCGTCAAAAAGAAATCACAGCGTGATAAACCACAAGCCCC 
GAT T GCT T T AAAAG C T T T ACT T GC AT AT TG ACG C AT T G C T T C C AT AGT T GC T T CAT C AACT T TAG C T GG AAT AT C CAT AGT AAT T T 
TAT TAT CAAT AT AT T T GG C GT C AT AGT C AT AGAAAT CGACG T CT T T AACGACT T C GC C AGG AAAAGT T G T CT T AAC AT CAT TAT T G 
CCTAAAATACCTACTTCAATTTCACGAGCTGTCACGCCTTGTTCAATCAAAATACGGCTATCATACTTGAGAGCTAAGTCAATksC 
AGAGCGAAGTGAGGATTCATCTGTCGCTTTTGAAATACCTACTGATGACCCCATATTAGCCGGTTTTACAAAAATTGGGAAACTTA 
AAGTTTCTAAAGAGAGTTTAATCGCATGTTCCAAATCATCACCCTCAAAATAAGTTTGATATGCAACCTGAGGTACACCTACTGTT 
GCAAGGACTTGTTTTGTTGTAATTTTATCCATAGCCACGCTTGAAGATAGAATATTAGTCCCAACATAAGGCATCCTTAAAACTTC 
T AAAAAT C CT T GGAT AG AAC CAT CT T C C C C CAT T GGT C CAT GT AAAACGGG GAAAAC AAT T GC AT TAT CAT C AT AGAT AT C AC T T G 

GACGAACCATTTTGTCTT^AATCAACAGTTTGGTTTGTCATTAACTTTTCATCTGAAGATGGCATTTCATCAAATTCTTGTGTTTTA 
AT AAATT GAC CTACTT GCGTG 

SEQ ID NO. 1603: SAG0767 FROM THE COH1 TYPE la STRAIN 

TCGCTCTGCGG7\ACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTT 
ATTTTATCACGCAAGTAGGTCAATTTATTAAAACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTAATGACAAACCAA 
ACT GT TG AT T TAG AC AAAAT GGTTCGTC C AAGTG AT AT CT AT GAT GAT AAT G CAAT TGTTTTCCCCGTTT T AC AT G GAC CAAT G G G 

GGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTAT 

SEQ ID NO. 1604: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

CGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGG 

AAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGAT 
GG AC AAAT C T T C T T AAAC GAAC T G AAT AC AAT G C C C 

SEQ ID NO. 1605: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AACGTGAAGTATCTGTACTGCTCTGCAGAAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCA 
CG C AAGT AGGT CAAT T TAT T AAAAC AC AAG AAT T T GAT GAAAT G C CAT C T T C AG AT G AAAA 

SEQ ID NO. 1606: SAGO 7 67 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

CTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGAT 
AG C C GT AT T T T GAT T GAAC AAGG C GT GAC AG C T C G T GAAAT T G AAGT AG GT AT T T T AGG CAAT AAT GAT GT T AAG AC AACT T T T C C 

TGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAG 
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TT GAT G AAG C AACT AT GG AAGC AAT G C GT C AAT AT G C AAGT AAAG CT T T T AAAG C AAT C GG GGCTTGTGGTT TAT C AC G CT GT GAT 
TTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCTCTGCT 
TTGGGAAAAT 

SEQ ID NO. 1607: SAG0767 PROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

TTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAAT 
AATGATGTTAAGACAACTTTTCCTGGCGT^AGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAAT 
TACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGG 
CTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACT 
CAGTGGTCAATGTATCCCCTGCTTTGGGAAAAGTATGGGGCTAACCTT 

SEQ ID NO. 1608: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN 

ATCTGTACTGTCTGCAGAAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAGGT 
C AAT T T AT T AAAAC AC AAGAAT T T GAT G AAAT G C CAT C T T C AGAT G AAAAG T T AAT GAG AAAC C AAAC T G T T GAT T T AGAC AAAAT 
GGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAG 
GATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAA 

SEQ ID NO. 1609: SAG07 67 FROM THE 2603V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GGCTATGGATAAAATTACAACAAAACAAGTCCTTGCAACAGTAGGTGTACCTCAGGTTGCATATCAAACTTATTTTGAGGGTGATG 
ATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGGGGTCATCAGTAGGT 
AT T T C AAAAG C GAC AGAT GAAT C CT C ACT TCGCTCTG C AAT T G ACT T AGC T C T C AAGT AT G AT AGC C GT AT T T T GAT T G AAC AAGG 
CGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTCGTTAAAGACGTCG 
ATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCA 
ATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGAATGGAC 
AAAT C T T C T T AAAC G AACT GAAAT AC 

SEQ ID NO. 1610: SAG0767 FROM THE 2603V/R GBS TYPE V STRAIN 

TCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGAT7VAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAGGTCA 
AT T T AT T AAAAC AC AAG AAT T T GAT GAAAT GC CAT C T T C AG AT G AAAAGT T AAT GAC AAAC C AAACT GT T GAT T TAG AC AAAAT GG 
TTCGTCCAAGTGATATCTATGATGATAAT 

SEQ ID NO. 1611: SAGO 7 67 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

AAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCA 
AGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACA 
ACTTTTCCTGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCC 
AGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCAC 
GCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTAT 
CCCCTGCTTTGGGAAAATATGGGGCTAACTTATAG 

SEQ ID NO. 1612: SAG0767 FROM THE H36b TYPE lb STRAIN 

CGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCA 
AGTAGGTCAATTT ATT AAAACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTT AAT GAC AAAC CAAACTGTTGATT TAG 
ACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCT 
ATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTATGGATAAAATTACAAC 
AAAACAAGTCCTTGCAACAGTAG 

SEQ ID NO. 1613: SAG0767 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

ATGCGATTAAACTCTCTTTAGAACCTTTAAGTTTCCCAATTTTTGTAAACCCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAA 
GCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGC 
TCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTTGTTAAAGACGTCGATTTCTATG 
ACT ATGACGCC AAAT AT AT TG AT AAT AAAAT T ACT ATGG AT ATT CCAGCTAAAGTTGATG AAGC AACT ATGG AAGC AAT GCGTCAA 
TATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTT 
AAACG AAC T GAAT AC AAT GCCCGGTTT TACT C AG T GGT C AAT GT AT CCTCTGCTTT GG G AAAAT AT G G GG CT AACT T 

SEQ ID NO. 1614: SAG0767 FROM THE M732 GBS TYPE III STRAIN 

GTCATGCCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAGGTCAATTTATTAAAACACAAGAAT 
T T GAT GAAAT G C CAT CT T CAGAT GAAAAG T T AAT GACAAAC C AAAC T GT T GAT TTAGACAAAATGGTTCGTCC AAGT GAT AT CT AT 
GATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGAT 
G C CT T AT GT T G GG AC T AAT AT T C T AT CT T C AAG C GT GGC T AT GG AT AAAAT T AC AAC AAAAC AAGT C C T T G C AAC AGT AG GT GT AC 
CTCAGG 

SEQ ID NO. 1615: SAG0767 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

TTTTGAGGGTGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGG 
GGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATT 
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TTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGT 
CGT T AAAG ACGT CG AT T T CT AT GACT AT G AC G C C AAAT AT AT T GAT AAT AAAAT T ACT ATG G AT AT T CC AGCT AAAGT T GAT GAAG 
CAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTG 
ACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAA 
TATGGGGCTAACTTATAGTGA 

SEQ ID NO. 1616: SAG0767 FROM THE A909 GBS TYPE la STRAIN 

TGGTCGCTCTGCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAA 
CT T AT T T T AT C AC G C AAGT AGGT C AAT T T AT T AAAAC AC AAG AAT T T GAT G AAAT GC CAT C T T C AG AT G AAAAGT T AAT G AC AAAC 
CAAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAAT 
GGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTA 
TGG AT AAAATT AC AAC AAAAC AAGT CCTTGCAACAGTAGG 

SEQ ID NO. 1617: SAG0767 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

AAG C AGGGGAT AC AT T GAC C ACT GAGT AAAAC C GGG CAT T GT AT T C AGT T C G T T T AAGAAGAT C T GT C CAT C T T T CGT C AAAAAG A 
AATCACAGCGTGATAAACCACAAGCCCCGATTGCTTTAAAAGCTTTACTTGCATATTGACGCATTGCTTCCATAGATGCTTCATCA 
ACTTTAGCTGGAATATCCATAGCAATTTTATTATCAATATATTTGGCG 

SEQ ID NO. 1701: SAG1086 FROM THE1169NT1 GBS NONTYPEABLE STRAIN 

T T T AAAG GT T GAT TCCTTTTT GACT CAT C AGGT AGAT T T T G AGT T AAT GCAGGAAAT AG GT AAAGT T T T T G CT GAT AAAT AT AAAG 
AAGC C GG C AT T AC G AAGGT T GT T AC GAT T GAAGC AT C T GG AAT T GCG C C AGC AGT GT AC GC AG C T C AAG CAT T GGGC GT ACC AAT G 
ATATTTGCTAAAAAGGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGWTACGAG 
TCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTA 
AAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGT 
GAT T T GT TAG AAAAAAC AGGT GT T C C AGT 

SEQ ID NO. 1702: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN 

TTTAGGTGAG AAC ATTTTAAAGGTTGATTCTTTTTT GACT CATC AGGT AGATTTTGAGTTAATGCAGG AAAT AGGT AAAGTTTTTG 
C T GAT AAAT AT AAAG AAG C CGG CAT T AC GAAG GT T G T T AC GAT T GAAG CAT CT G G AAT T GC AC C AG C AG T G T AC GC AG C T C AAG C A 
TTGGGCGkACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTAC 
AAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAA 
ACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCT 
T T C C AAGAT GG G C GT GAT T T GT TAG AAAAAAC A 

SEQ ID NO. 1703: SAG0767 FROM THE H36bl GBS TYPE lb STRAIN 

AAGAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAG 
TTAATGCAGG AAAT AGGT AAAG TTTTTGCT GAT AAAT AT AAAGAAGCCGGC AT TACGAAGGTTGTT AC AATTG AAGC AT CTGG AAT 
TGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTA7VAAAAGCTAAGAACATTACTATGACTGAAGGTA 
TCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACT 
GTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGC 
T GGT AT C G G AAT C YT TAT T G AAAAAT CT T T C C AAGAT GGG CGT GAT T 

SEQ ID NO. 1704: SAG0767 FROM THE M732 GBS TYPE III STRAIN 

ATTCTTTTTTGACTATCAGGTAAATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTA 
CGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAA 
AAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTAT 
TGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTG 
AAATTATTGGTCAAGCTGAAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAA 
AAAACAGGTGTTCCGGTTACTTCTCTTGCTCGT 

SEQ ID NO. 1705: SAG0767 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAAATTTTGAGTT 
AATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTG 
CGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATR 
TTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGT 
ACTCATCATTGATGACTTTTTAACAAACGGTCAAGC 

SEQ ID NO. 1706: SAG0767 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

ACATTTTAAAGGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATAT 
AAAG AAG C C GG C AT T AC G AAG GT T GT T AC GAT T GAAGC AT C T G G AAT T G C AC C AG C AGT G T ACG C AG C T C AAG CAT T GGG CGT ACC 
AATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTA 
CGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACMGTCYAGCG 
GCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGG 
GCGTGATTTGTTAGAAAA 
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SEQ ID NO. 1707: SAG0767 PROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

AC GT AT T C T T AAAGAT G GT GAT G T T T T AGGT GAG AAC AT T T T AAAAGT T GAT TCTTTTTT GACT CAT C AG GT AG AT T T T GAGT T AA 
TGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCG 
CCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTT 
AACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTAC 
T CAT CAT T GAT G ACT T T T TAG CAAAC GGKC AAG CGG S T AAAG GAT T ACT T GAAAT TAT T G GT C AAG C T G GAG CT A 

SEQ ID NO. 1708: SAG0767 FROM THE COHl GBS TYPE la STRAIN 

TTTAAAAGTTGATTCTTTTTT GACT CAT CAGGTAAATTTTGAGTTAATGCAGGAAAT AGGT AAAGTTTTTGCTGATAAAT AT AAAG 
AAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATG 
ATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAG 
TCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTA 
AAGGAT.TACTTGAAATTATTGGTCAAGCTGAAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGT 
GATTTGTTAGAAAAAACAGGTGTTCCGGTTAC 

SEQ ID NO. 1709: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGC 
ATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTA 
CAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCA 
AACGGTCAAGCGGCTAAAGGATTACTTGAAATTTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAAT 
CTTTCCAAGATGGGCGTGATTTGTTAGAAAAAACAGGTGTTCCAGT 

SEQ ID NO. 1710: SAG0767 FROM THE 2603 V/R GBS TYPE V STRAIN 

AAC GT AT T C T T AAAGAT GGT G AT GT T T T AGG T GAG AAC AT T T T AAAAG T T GAT TCTTTTTT GACT CAT C AG GT AG AT T T T GAGT T A 
ATGCAGGAAAT AGGT AAAGTTTTTGCTGATAAAT AT AAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGC AT CTGGAATTGC 
GCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCT 
TAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTA 
CTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGG 
TATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAAAAAACAGGTGTTCCAG 

SEQ ID NO. 1711: SAG0767 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

ACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAA 
AAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTA 
TT GT GAG T C G CT T T T T AT CT AAC GAT GAT AC T G T ACT CAT CAT T GAT G AC T T T T TAG CAAAC GGT C AAG C G G CT AAAGGAT T AC T T 
GAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGA 

SEQ ID NO. 1801: SAG1600 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

AATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATT 
TCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGCCTGGCAAGAAATTAAAGAAAAACTA 
GACGTGCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTAC 
TCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGTATCCCTTGCTTGTCCGA 
AATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGTGGTTTATGAAACGTTGTCCCCATTAGTTGGT 
AAATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGAGGTTAAATT 
AATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTATTTTGAGATAAACCATAATTGGCAAAATAAACACG 
G T GGT CAT C AC T T T T AC AC AAC C G C C AG C C C AA 

SEQ ID NO. 1802: SAG1600 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAG 
ATTAGAGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGC 
AGT T G C C T GG C AAG AAAT TAAAGAAAAAC TAG AC AT C C CT GT T T T AGG CGT TAT T T T AC C AGG AG CT AG C GC AG CT AT C AAAT C AA 
CTAATTTAGGGAAAGTTGGTATTATAGGTACTCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCA 
AATACTGCTGTGGTATCCCTTGCTTGTCCGAAATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGT 
GGTTTATGAAACGTTGTCCCCATTAGTTGGTAAATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCCTATTACGTCCCATCA 
TTCAAAATGTTATGGGGGCTGAGGTTAAATTAATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTATTTT 
G AG AT AAAC C AT AAT T GG C AAAAT AAAC ACGGT GGT CAT C AC T T T T AC AC AAC CG C C AG C C C AAAAGGT T T T AAAG AAA 

SEQ ID NO. 1803: SAG1600 FROM THE 090 GBS TYPE la STRAIN 

AAT C T T CAT T G G AGAC C AG G C T AG AG CT C C GT AT GGT C CT AG AC C T G CT C AAC AG AT TAG AG AG T t AC CT GG C AGAT GGT T AAT T T 
CTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGCCTGGCAAGAAATTAAAGAAAAACTAG 
ACATACCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTACT 
CCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGTATCCCTTGCTTGTCCGAA 
ATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGTGGTTTATGAAACGCTGTCCCCATTAGTTGGTA 
AATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGAGGTTAAATTA 
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AT T GAT AG T GGC G C AGAAAC CGT T CGT GAT AT TTCTGTTT TAT T GAACT AT T T T GAGAT a AmC CAT a AT T GG s mAAAT AAAC AC G G 
TGGTCATCACTTTTACACAACCGsCAGCCCAAAAGGTTTTTAAGGAAATTGCAGAACAATGGCTTAATCAAGAAATAAAT 

SEQ ID NO. 1804: SAG1600 FROM THE A909 GBS TYPE Xa STRAIN 

GCGGTTGTGTAAAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACAGAAATATCACG 
AACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGCGTGC 
AACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAAACTAGAAGACATCTGA 
T T T GAT T CC AC AATT GGAAC AAAT T T C GG AC AAG C AAGG GAT AC C AC AGC AGT ATT T GGAG AC AAAG CT T GAAT T T T T T GAC G AT A 
AGCATCTGATTTAACAGTCATGGGAGTACCTATAATACCAACTTTCCCTAAATTAGTTGATTTGATAGCTGCGCTAGCTCCTGGTA 
AAATAACGCCTAAAACAGGGATGTCTAGTTTTTCTTTAATTTCTTGCCAGGCAACTGCAGTTGCTGTATTACAAGCTATAACAATC 
ATCTTAACATTTTTAGTCAATAAGAAGTTAACCATCTGCCAGGTAAACTCTCTAATCTGTTGAGCAGGTCTAGGACCATACGGAGC 
TCTAGCCTGATCTCCAATGAAGATTACTTCCTCTTCTGGAAGTTGACGGAACATTTCCTTAACAACCGTTAAACCACCT 

SEQ ID NO. 1805: SA61600 FROM THE COHl GBS TYPE la STRAIN 

TTCCGTCAACTTCCAAAATATGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAG 
AGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTG 
CCTGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACTAAT 
TTAGGGAAAGTTGGTATTATAGGTACTCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATAC 
TGCTGTGGTATCCCTTGCTTGTCCGAAAT 

SEQ ID NO. 1806: SAG1600 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAA 
TTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGCCTGGCAAGAAATTAAAGAAAAAC 
TAGACATAC 

SEQ ID NO. 1807: SAG1600 FROM THE 1169NT1 GBS TYPE V STRAIN 

CTTTTGGGCTGGCGGTTGTGTAAAATTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACA 
G AAAT AT C ACG AACGGT T T C T G C G C C AC T AT C AAT T AAT T T AAC C T C AG C C C CC AT AAC AT T T T G AAT AAT GG G AC GT AAT AGG GG 
AT AATG CGT G C AAC C T AAAAT T AAAGT AT C T AAT T T AC C AAC T AAT GG GGAC AAT GT T T CAT AAAC C AC CTTTTTGG C T AAACT AG 
AAGAC AT CT G AT T T GAT T C C AC AAT T G G AAC AAAT T T C GG AC AAGC AAG G GAT AC C AC AGC AGT AT T T G GAG AC AAAG CT T GAAT T 
T TT T GAC GAT AAG CAT C T GAT T T AAC AGT CAT G G G AGT AC CT AT AA 

SEQ ID NO. 1808: SAG1600 FROM THE 1169NT1 GBS TYPE V STRAIN 

GTAATCTTCATTGGGGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAA 
T TT C T TAT T G AC T AAAAAT GT T AAG AT GAT T GT TAT AG C T T GT AAT AC AGC AACT GC AG T T 

SEQ ID NO. 1809: SAG1600 FROM THE 18RS21 GBS TYPE II STRAIN 

GAAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACA 
GATTAGAGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTG 
CAGTTGCCTGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCA 
AC T AAT T T AGG G AAAGT T GGT AT TAT AG GT AC T C C CAT GAC T GT T AAAT C AG AT G CT TAT CGT C AAAAAAT T C AAG C 

SEQ ID NO. 1810: SAG1600 FROM THE 18RS21 TYPE II STRAIN 

ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATATTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATA 
GT T C AAT AAAAC AG AAAT AT C AC G AAC GG T TTCTGCGC C ACT AT C AAT T AAT T T AAC C T C AG C C C C CAT AAC AT T T T GAAT GAT G G 
GAC GT AAT AT G GG AT AAT GC GT GC AAC C T AAAAT T AAAGT A 

SEQ ID NO. 1811: SAG1600 FROM THE 2603 V/R GBS TYPE V STRAIN 

ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAAT 
AGTTCAATAAAACAGAAATATCACGAACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATG 
GGACGTAATAGGGGATAATGCGTGCAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTT CAT AAAC CACCTT 
T T T GG CT AAAC TAG AAG AC AT C T GAT T T GAT T C C AC AAT T GG AAC AA 

SEQ ID NO. 1812: SAG1600 FROM THE M781 GBS TYPE III STRAIN 

G GC G GT T GT GT AAAAGT GAT GAC C AC CGT GT T T AT T T T G C C AAT TAT GGT T TAT C T C AAAAT AG T T C AAT AAAAC AG AAAT AT C AC 
GAACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGCGTG 
CAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAAACTAGAAGA 

SEQ ID NO. 1813: SAG1600 FROM THE M 781 GBS TYPE III STRAIN 

AATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAACT 
TCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGC 

SEQ ID NO. 1814: SAG1600 FROM THE JM9130013 GS TYPE VIII STRAIN 
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TGGGCTGGCGGTTGTGTAAAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACAGAAA 
TAT C AC G AAC GGT T T CT G C GC C ACT AT C AAT T AAT T T AAC CT C AG C C C C CAT AAC AT T T T GAAT GAT G G G ACGTAATAAG GG AT AA 
TGCGTGCAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAAACTAGAAGA 
C AT CT GAT T T GAT T C C AC AAT T GG AAC AAAT T T CG G AC AAGC AAG G GAT AC C AC AG C AGT AT T T G G AGAC AAAG C T T GAAT T T T T T 
GACGATAAGCATCTGATTTAACAGTCATGGGAGTACCTATAATACCAACTTTCCCTGAA 

SEQ ID NO. 1901: SAG1680 FROM THE 2603 V/R GBS TYPE V STRAIN 

ATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTC 
GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCAT 
CAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGT 
TTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCAT 
AGCTGCTTGAACTGCAACTGCTTTACCTGT^ACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTG 
C T AAAG CT T T AAAAC AAC C AAT G CC AT C T GT CAT AT G G C C T AC T AAACGT C C GGT T C C AC C T T GAT T AACGAT AGT AT T T AC AGC A 
CCCACTAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACCACGAAT 
ACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTT 
CT T G AAAAG AGGT ATT C C AC AT T AACG GGG AT AG AGAGT GGCGT G C AGG 

SEQ ID NO. 1902: SA61680 FROM THE H36b GBS TYPE lb STRAIN 

GTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACAAA 
T C GT AAC AAT GCTGTTtCTT T AGGC T T GT AAAC C AAGT C G AC AAC T AC T AAAT T C GGT GT T AAAAT T T C T GGAT C GT T AAT T AAAC 
TATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTA 
TTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCT 
GTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTA 
T T GT AAT T AT T T TAT T T T TAG C ACT G AAAC C T T G AG CT G C T AAAG C T T T AAAAC AAC C AAT G C CAT CT GT CAT AT GGC C T AC T AAA 
CGTCCGGTTCCACCTTGATTAACGATAGTATTTAGAGCACCCACTAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGATAAC 
ACT C T G T T T AAAT GG C AT T G AAAC AT T AAC AC C AC GAAT AC C C AAT GC C C T G AC AC C T C G AAC AGC T T C T GT T AAT T T ACC C T CT T 
CTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGATAGAGAGTGGCGTGCA 
GGA 

SEQ ID NO. 1903: SAG1680 FROM THE M732 GBS TYPE III STRAIN 

CTGGTCTAATTGCCAATCCTGCACGCCACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCC 
TAT C T G AC AT T T G AAGT AG AAG AGGGT AAAT T AAC AG AAG C T GT T C GAG G T GT C AGGG C AT T G AGT AT TCGTGGTGT T AAT GT T T C 
AATGCCATTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTA 
AT C AAG GT GG AAC C GG AC G T T T AGT AG G C CAT AT G AC AG AT G G CAT TGGTTGTT T T AAAG C T T T AG C AG C T C AAG GT T T C AGT G CT 
AAAAATAAAATAATTACAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGAGTTGCGGAAAT 
TAGATTATTTAATCGTAACAGCTCAAATTACGATAAGGTCATTGACTTATCAGATAAAATTAAAAAACAGTTTCAAATAAAGGTAG 
TCGTTGATTATCTAGAAAATAAGACAGCATTTAAAGACGCTATTAGAACTAGTCATTTTTATATTGATGCTACTAGTTTAGGAATG 
AGGCCATTAGATAATTATAGTTTAATTAACGATCCAGATATTTTAACACCGAATTTAGTAGTTGTCGACTT 

SEQ ID NO. 1904: SAG1680 FROM THE M781 GBS TYPE III STRAIN 

AAATCAGCATCCCTAGACATTATAAGCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAA 
CCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTA 
GTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTG 
AAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTC 
CCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCT 
T GAG C T G C T AAAG CT T T AAAAC AAC C AAT GC C AT CT GT CAT AT G G C CT AC T AAAC GTCCGGTTC C AC C T T G AT T AAC GAT AGT AT T 
T AC AG C AC C C ACT AAT T T AG CT T GAG GAG AT AAAT CAT C T AG C AAAG G GAT AAC AC T C T GT T T AAAT GG CAT T G AAAC AT T AAC AC 
CACGAATACTCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATG 
TTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGATAGAGAGTGGCGTGCA 

SEQ ID NO. 1905: SAG1680 FROM THE 090 GBS TYPE la STRAIN 

GTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCCtTTGCTArATGATTT 
AT C T C C T C AAG C T AAAT T AGT G GGT G C T GT AAAT ACT AT C G T T AAT C AAGGT G G AAC C G s AC GT T T AGT AGG C CAT AT GAC AG AT G 
GCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTTACAATAGCTGGTATTGGTGGTTCAGGT 
AAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGAGTTGCGGAAATTAGATTATTTAATCGTAATAGCTCAAATTACGATAAGGTCAT 
T G ACT TAT C AG AT AAAAT T AAAAAAC AGT T T C AAAT AAAG GT AGT C GTT GAT TAT C TAG AAAAT AAG AC AGC AT T T AAAG AC G C T A 
TTAGAACTAGTCATTTTTATATTGATGCTACTAGTTTAGGAATGArGCCATTAGATAATTATAGTTTAATTAACGATCCAGAAATT 
T T AAC AC C C AAT T T AGT AGT T G T C GAC T T GGT T T AC AAG C C T AAAG AAAC AGC AT T GT T AC GAT T T GT T AG AC AAAAT G GAGT G AA 
AC AT G C T TAT AAT GGT C T AG G G AT G C T GAT T T AT C AAG G AG C AG A 

SEQ ID NO. 1906: SAG1680 FROM THE A909 GBS TYPE la STRAIN 

CCCTAGACCATTATAATCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGA 
CAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCA 
ATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTT 
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TTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAG 
CTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTGCT 
AAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAAACGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACC 
CACTAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACCACGAATAC 
CCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCT 
TGAAAAGAGGTATTCCACATTAACGGGGATAG 

SEQ ID NO. 1907: SAG1680 FROM THE COHl GBS TYPE la STRAIN 

TGCACGCCACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGA 
AGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGAGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTG 
TTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACT 

SEQ ID NO. 1908: SAG1680 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAA 
C AAAT C GT AAC AAT GCTGTTTCTT TAG G C T T GT AAAC C AAGT CG AC AACT AC T AAAT T GGGT GT T AAAAT T T CT GG AT C GT T AAT T 
AAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGT 
CTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTG 
AGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAACTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCA 
GCTATTGTAACTATTTT 

SEQ ID NO. 1909: SAG1680 FROM THE CJB110 GBS NONT Y PE ABLE STRAIN 

ACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGGGT 
AAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCC 
TTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGGTGGAACCGGACGTTTAGTAG 
GCCATATGACAGATGGCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTTACAATAGCTGGT 
ATTGGTG 

SEQ ID NO. 1910: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN 

ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAA 
CAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATT 
AAACT AT AAT T AT CT AAT G GC C T CAT T C CT AAAC T AGT AG CAT C AAT AT AAAAAT G ACT AGT T CT AAT AGC GT C T TT AAAT G C T GT 
CTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTG 
AGCTGTTACGAT 

SEQ ID NO. 1911: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN 

ACTTCTCTATTCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGG 
GTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATC 
CCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGGTGGAACC 

SEQ ID NO. 1912: SAG1680 FROM THE 18RS21 GBS TYPE II STRAIN 

TCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCATCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACA 
AATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAA 
ACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCT 
TATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAG 
CTGTTACGATTAAATAATCTAATTTCCGCAAC 

SEQ ID NO. 1913: SAG1680 FROM THE 18RS21 GBS TYPE II STRAIN 

ATGCCTATCTGAC ATT TGAAGTAGAAGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATT GGGT ATT CGTGGTGTT AAT 
GTTTCAATGCCATTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTAT 
C GT T AAT C AAGG T GG AAC CG G ACGT T T AGT AGG C CAT AT GAC AGAT G G CAT TGGTTGTTT T AAAGCT T T AG C AG C T C AAGGT T T C A 
GTGCT AAAAAT AAAATAATTACAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGAGTTGCG 
G 

SEQ ID NO. 1914: SAG1680 FROM THE JM9130013 GBS TYPE VIII STRAIN 

CCCTAGACCATTATAAGTCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCG 
ACAACTACTAAATTGGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATC 
AATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTT 
TTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATA 
GCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAACTATTTTATTTTTAGCACTGAAACCTTGAGCTGC 
TAAAGCTTTAAAACAACCAATGCCATCTGTCAT 

SEQ ID NO. 2001: SAG1723 FROM THE COHl GBS TYPE la STRAIN 

ATCGATTCGATATTGTAGTGGCTAACGAAGAAG2\AGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGAT 
GT CATC AAAT AT AAAAAT GAC AC CTT AACT ATT AAC AAT AAAAAAACAGAAGAACCTT AC CTC AAGG AAT AT ACT AAATT AT TTAA 
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AAAGG AT AAAT T AC AG G AAAAAT AT T C GT AT AAC C C AC T T T T C C AAG AC CT AG C AC AAAG C T CT AC C G C T T T C AC C AC T G AC AG C A 

ATGGCAGCAGCGAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGT 
GCCGTCGGTTCCTTCAAAA 

SEQ ID NO. 2002: SAG1680 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

T AAAGT T GAC GGAC AC T C CAT GGAT CC AAC T T TAG C T G AC AAG GAAC AG CT AGT AGT T C T CAAACAAAC AAAAAT C AAT CGAT T C G 

ATATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAA 
TAT AAAAAT GAC AC CTTAACT ATTAACAATAAAAAAACAGAAGAAC CTTAC CT CAAGGAATATACT AAATT AT T TAAAAAGGATAA 
AT T AC AGG AAAAAT AT T C GT AT AAC C C AC TT T TC C AAG AC C T AG C AC AAAGC T CT AC C GC T TT C AC T AC T GAC AG C AAT GGC AG C A 

GCGAATTTACTACTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 
CCCTT C AAAAAAT C AAC AAT T GT GGGAG 

SEQ ID NO. 2003: SA61680 FROM THE 18RS21 GBS TYPE II STRAIN 

TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAGTTCTCAAACAAACAAAAATCAATCGATTCGATATT 
GT AGT G GC T AACGAAG AAG AAG GCG G C C AAAAG AAAAAAAT T GT T AAACGT GT CAT T G GT AT G CC AGG T G AT GT CAT C AAAT AT AA 
AAAT GAC AC C T T AAC T AT T AAC AAT AAAAAAAC AGAAG AAC C T T AC C T C AAG G AAT AT AC T AAAT TAT T T AAAAAG GAT AAAT T AC 
AGGAAAAAT AT T CGT AT AAC C C ACT T T T C C AAGAC C TAG C AC AAAG C T C T AC C G C T T T C AC C ACT GAC AG C AAT G G C AGC AG C G AA 

TTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGTCCCTT 
C AAAAAAT C AAC GAT T GT GGGAGAGGT 

SEQ ID NO. 2004: SAG1680 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

AAGT T GAC G GAC AC T C CAT GGAT C C AACT T T AGCT GAC AAG GAAC AG C TAG T AGT T CT CAAACAAAC AAAAAT C AAT C GAT T CGAT 

ATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATA 

T AAAAATG AC ACCTT AACT ATT AACAAT AAAAAAAC AGAAGAACCTT AC CTCAAGGAAT AT ACT AAATT AT TTAAAAAGGAT AAAT 
T AC AGGAAAAAT AT T C GT AT AAC C C AC T T T T C C AAG AC CT AG C AC AAAG CTCTACCGCTTT C AC C AC T GAC AG C AAT G G C AG C AG C 

GAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 

SEQ ID NO. 2005: SAG1680 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

T T GAC GGAC AC T C CAT GGAT C C AAC T T TAG C T GAC AAG GAAC AG C T AGT AG T T CT C AAAC AAAC AAAAT AAT C GAT T C GAT AT T GT 

AGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATAAAA 
AT GAC AC C T T AAC TAT T AAC AAT AAAAAAAC AG AAGAAC CT T AC C T C AAGG AAT AT AC T AAAT TAT T T AAAAAG GAT AAAT T AC AG 

GAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATT 
TACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

SEQ ID NO. 2006: SAG1680 FROM THE M781 GBS TYPE III STRAIN 

T T GAC G GAC AC T C C AT G GAT C C AACT T T AGCT G ACAAGGAAC AG CT AGT AGT T C T CAAACAAAC AAAAAT C AAT C GAT T C GAT AT T 
G T AGT G G CT AACGAAG AAG AAG GC GGC CAAAAGAAAAAAAT T GT T AAACG T GT CAT T GGT AT G C C AGG T GAT GT C AT C AAAT AT AA 
AAAT GAC AC C T T AAC TAT T AAC AAT AAAAAAAC AG AAG AAC C T T ACC T CAAGGAATATACT AAAT TAT T T T AAAAAGGAT AAAT T A 

CAGGAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGA 
ATTTACT 

SEQ ID NO. 2007: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

T T GG T AAAGT T GAC GGAC AC T C C AT G GAT C C AACT T TAG C T GAC AAGG AAC AG C T AGT AGT T C T CAAACAAAC AAAAAT C AAT C GA 

TTCGATATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCAT 
C AAAT AT AAAAAT GAC AC C T T AACT AT T AAC AAT AAAAAAAC AG AAG AAC C T T AC CT C AAGGAAT AT AC T AAAT T AT T T AAAAAGG 
AT AAAT T AC AGG AAAAAT AT T C GT AT AAC C C AC T T T T C C AAG AC C TAG C AC AAAG CT C T AC CGC T TTCACTACT GAC AG C AAT GGC 

AGCAGCGAATTTACCACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGT 
CGGCCCCTTCAAAAAATCAACG 

SEQ ID NO. 2008: SAG1680 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

T T GAC GGAC AC T C CAT GGAT C C AAC T T T AG C T GAC AAG GAAC AG C TAG TAG T T CT CAAACAAAC AAAAAT C AAT C GAT T C GAT AT T 
GT AGT G G CT AAC G AAGAAG AAG G C G GC CAAAAGAAAAAAAT T GT T AAAC GT GT CAT T G G TAT G C C AGGT GAT GT CAT C AAAT AT AA 

AAATG AC ACCTT AACT ATT AAC AAT AAAAAAAC AG AAGAACCTTACCTC AAGG AAT AT ACT AAAT TAT TTAAAAAGGAT AAATT AC 
AGGAAAAAT AT T C GT AT AAC C C ACT T T T C C AAG AC C TAG C AC AAAG C T C T AC C G C T T T C AC C AC T GAC AG C AAT GG C AG C AG C GAA 

TTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

SEQ ID NO. 2009: SAG1680 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

T AAAG T T G ACG G AC AC T C CAT GGAT C C AAC T T T AG C T GAC AAG GAAC AG C T AGT AGT T C T CAAACAAAC AAAAAT C AAT C GAT T CG 
AT AT T G T AGT G G C T AAC G AAG AAG AAG G C G G C CAAAAGAAAAAAAT T GT T AAAC G T GT CAT T GG T AT G C C AGGT GAT GT CAT C AAA 

T AT AAAAATG AC AC CTTAACT ATT AAC AAT AAAAAAAC AG AAGAACCTTACCTC AAGG AAT AT ACT AAATT ATT T AAAAAGG ~TAA 
AT T AC AGG AAAAAT AT T C GT AT AAC C C ACT T T T C C AAG AC C T AG C AC AAAG C TCTACCGCTTT C AC T AC T GAC AG C AAT GG C AG C A 

GCGAATTTACTACTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 
SEQ ID NO. 2010: SAG1680 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 
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AAAG T T G ACG G AC AC T C C AT GGAT C C AACT T T AG CT G AC AAG G AAC AGC T AGT AG T T CT C AAAC AAAC AAAAAT C AAT C GAT T C G A 
TATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAAT 
AT AAAAAT G AC AC CT T AACT AT T AAC AAT AAAAAAAC AG AAG AAC CT T AC CT C AAG GAAT AT AC T AAAT TAT T TAAAAAGG AT AAA 
TTACAGGAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAG 
CGAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGTC 
C CT T C AAAAAAT C AAC G 

SEQ ID NO. 2101: SAG0079 FROM THE 2603V/R GBS TYPE V STRAIN 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
T T C CT GAT GAAGT AAC AAAC G GG AT T GT AAAAGAG C G C T T AGCT GAGGATG AT AT CGC AGAAAAAGG T T T T T TACT T GAT G GAT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
G GAT C CAT C AT GT C T TAT AG AG C GT T T G AGT G GT C GT AT T AT C AAT C GT AAAACT GGT GAAAC T T T C C AC AAAGT G T T C AAC C C AC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGATGTTGAAAAAGCGTTG 

SEQ ID NO. 2102: SAG0079 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AAC AGG GGAT AT GT T C C G CG C CG C AAT GG CT AAT C AAAC C G AAAT GG G ACGT T TAG C T AAAAGT T AT AT T GAT AAAGGT GAAT T GG 
T T C CT GAT GAAGT AAC AAAC G GG AT T GT AAAAGAGC G C T TAG C T GAG GAT GAT AT C GC AG AAAAAG GT T T T T T AC T T GAT GGAT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGAT GTT GAAAAAGCGTT GCTAGAACT CAAA 

SEQ ID NO. 2103: SAG0079 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT ) 

T GGT AAAGGG ACT CAAGC AGCT AAG ATTGT T GAAGAATTT GG TGTTGCGC AC AT CTC AAC AGGGGAT AT GTTCCGCGCCGC AAT GG 
CTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGATCAAGTAACAAACGGGATTGTA 
AAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGGTATCCACGTACTATTGAACAAGCACACGCCTT 
AGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTTATAGAGCGTTTGA 
GTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTAT 
C AACGT GAAG AT GAT AAG C C T GAAAC T GT C AAAC GT C G CT T GG AC GT TC AT AT T G CT C AAGG AG AAC C TAT T CT T G AAC ACT AT AG 
T AAG CT T G G C C T T GT T AC AG AT AT T GAAGGT AAT C AAG AAAT AA 

SEQ ID NO. 2104: SAG0079 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTCGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
T T C CT GAT GAAGT AAC AAAC GGGAT T G T AAAAGAG CG C T T AG C T G AGG AT GAT AT C GC AG AAAAAGGT T T T TT AC T T GAT G GAT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGT GTT ATT AAT ATT AAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGAT GTT GAAAAAGCGTT GCT AG AA 

SEQ ID NO. 2105: SAG0079 FROM THE 2603V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AAC AG GGGAT AT GTTCCGCGC C GC AAT GG C T AAT C AAAC C G AAAT GG G ACGT T T AG CT AAAAGT TAT AT T GAT AAAG G T GAAT T G G 
TTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATAT 
C C AC GT ACT AT T G AAC AAGC AC AC G C C T TAG AT GC T AC GCT T GAAG AAC TAG G ACT AC GCT TAG AT GGT GT T AT T AAT AT T AAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGATGTTGAAAAAGCGTTG 

SEQ ID NO. 2106: SAG0079 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGT AAAGGT ACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AAC AGGGG AT ATGTTCCG CGC CGC AAT GGCT AAT CAAACCG AAAT GGG ACGT TT AGCT AAAAGT TAT ATT GAT AAAGGT GAAT TGG 
T T C CT GAT GAAGT AAC AAACGG GAT T GT AAAAGAG CG C T T AG CT GAG GAT GAT AT C G C AG AAAAAG G T T T T T T AC T T GAT G GAT AT 
C C ACGT AC TAT T G AAC AAG C AC AC GC CT TAG AT G CT AC G C T T GAAG AACT AGG AC T ACG C T TAG AT G G T GT TAT T AAT AT T AAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAATCTATTCTTGAACACTATCGAAAGCTTGGTCTTGTTACAGATATTGAAGGTAA 
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SEQ ID NO. 2107: SAG0079 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTTGCTTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
T T C CT GAT GAAGT AAC AAAC G GGAT T GT AAAAGAGCG C T TAG C T GAG GAT GAT AT CG C AG AAAAAGGTT T TT T AC T T G AT GG AT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
GG AT CC AT C AT GTCTT AT AGAGCGTTTGAGTGGTCGT ATT AT CT^ATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCC AC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATAG 

SEQ ID NO. 2108: SAGO 07 9 FROM THE COHl GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTCA 
ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGT 
T C C T GAT GAAGT AAC AAAC G GG AT T G T AAAAG AG C G CT TAG C T GAGG AT GAT AT C G C AGAAAAAG G T T T T T T AC T T GAT GGAT AT C 

CACGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTG 
GATCCAACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACC 
AGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTC 
AAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCA 
GATGTTGAAAAAGCGTTGCTAG 

SEQ ID NO. 2109: SAGO 07 9 FROM THE H36b GBS TRYP lb STRAIN (REVERSE COMPLEMENT) 

CAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 
CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCC 
ACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGG 
ATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAA.CTTTCCACAAAGTGTTCAACCCACCA 
G T AG AT T AT AAAG AAGAAG AT T AC TAT C AAC GT G AAG AT G AT AAG C CT GAAAC T GT C AAACGT C G CT T G GACGT T AAT AT T G C T C A 
AG GAGAAT CT AT T C T T G AAC AC TAT C G T AAG CT T G GT C T T GT T AC AG AT AT T G AAGGT AAT C AAG AAAT AAC AG AAGT T T T T G C AG 
ATGTTGAAAAAGCGTTGCT 

SEQ ID NO. 2110: SAG0079 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
TTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATAT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATT AT AAAGAAGAAG ATT ACT ATCAACGTGAAGATGATAAGCCTGAAACTGTT AAACGT CGCTTGGACGTT AAT ATT GCT 
C AAG GAG AAC CT AT T CT T G AAC AC T AT AAAAAG CT T GGT CT T GT T AC AG AT AT T G AAGGT AAT C A 

SEQ ID NO. 2111: SAGOO 7 9 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

CTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTCAAC 
AGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTC 
CTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 
CGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGA 
TCCAACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAG 
TAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAA 
GGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGA 
TGTTGAAAAAGCGTTGCTAGAACTCAAA 

SEQ ID NO. 2112: SAGO 07 9 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTACGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
TTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATAT 
C C AC GT AC TAT T GAG C AAGC AC AC G C C T TAG AT G C T AC G CT T G AAG AACT AG GAC T ACG CT T AG AT G GT GT TAT T AAT AT T AAAGT 
G GAT C C AAC AT G C C T T AT AGAG CG T T T G AGT G G C C GT AT TAT C AAT C GT AAAAC T G GT GAAAC T T T C C AC AAAG T GT T C AAC C C AC 
CAGT AG AT TAT AAAG AAG AAG AT T AC TAT C AAC GT G AAG AT GAT AAG C CT GAAAC T GT C AAACGT C G C T T GG AC GT T AAT AT T GCT 
CAA 

>SEQ ID NO 2150:090 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVTDIEGNQEITEVFADVEKALLELK 

>SEQ ID NO 2151:114 1169NT frame: 2 
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GKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPDQVTNGIVKER 
LAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIIN 
RKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVHIAQGEPILEHYSKLGLVTDI 
EGNQEI 

>SEQ ID NO 2152: 114_18RS21 frame: 1 

NLLTTGSPGAGKGTQAAKIVEEFGVAHISTGDMFRA2\MANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVTDIEGNQEITEVFADVEKALLE 

>SEQ ID NO 2153: 114_2603 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVT DIE GNQE I TE VFADVEKAL 

>SEQ ID NO 2154: 114_A909 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVT D I EG 

>SEQ ID NO 2155:114_A909 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVTDIEG 

>SEQ ID NO 2156: 114_CJB110 frame: 1 

NLLTTGLLGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
Y 

>SEQ ID NO 2157: 114__COHl frame: 3 

LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVTDIEGNQEITEVFADVEKALL 

>SEQ ID NO 2158: 114_H36B frame: 3 

GDMFRAAMANQTEMGRLAKSYIDKGELVPDEVTNGIVKERLAEDDIAEKGFLLDGYPRTI 
EQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIINRKTGETFHKVFNPPVDYKEE 
DYYQREDDKPETVKRRLDVNIAQGESILEHYRKLGLVTDIEGNQEITEVFADVEKAL 

>SEQ ID NO 2159: 114_JM9130013 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAJ1ALDATLEELGLRLDGVINIBCVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YKKLGLVTDIEGN 

>SEQ ID NO 2160:114_M732 frame: 1 

LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVT DIEGNQE I TEVFADVEKALLELK 

>SEQ ID NO 2161: 114__M781 frame: 1 

NLLITGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIPCVDPTCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQ 

SEQ ID NO. 2201: SAG0093 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 
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AAGCCTAACAGT CAACAAT CAT CAT CT C AAAAGTTGAGG AAT GAG G AT AT AAAAAAG AT AT C CT CT CAAAAAAGAAAT AAGAAAT T 
ACAATT ACC AGCT GTAT CAT CAAAAGATT GGAACTTGATTTTGGT CAAT CGT GACCAT AAACAT GAAGAATT AAGT CCAGATGT GG 
TTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
T GG AT AT GAGT ACT GT AGAT T CT T T G AAT G AG AGC G AT C CT AGAGT AGT C AGT C AGT T GAAAAAGAT AG C T C C AC AAT AT G GT T TT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAACAT CAT T T AAC AT T AGAAG AAT AC AT AAC T T TAT T AAAGG AG AAT AAC C AA 

SEQ ID NO. 2202: SAG0093 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT ) 

AAGCCTAACAGT CAACAAT CAT CACCT C AAAAGTTGAGG AAT G AGG AT AT AAAAAAGAT AT CCT CT CAAAAAAGAAAT AAGAAATT 
AC GAT T ACC AG CT GTAT CAT C AAAAG AT T G G AAC T T GAT T T T G GT CAAT CGT G AC CAT AAACAT GAAGAAT T AAG T C C AG AT GT GG 
TGCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGG AT AT G AGT ACTGT AGAT T CTTTG AAT GAGAGCG AT CCT AG AGT AGT C AGT CAGTT GAAAAAGAT AGCT CC AC AAT AT GGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAATATATGGCCGAACATCGTTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2203: SAG0093 FROM THE 18RS21 GBS TYPE II STRAIN 

AAGCCTAACAGT CAACAAT CATCATCT CAAAAGTT G AGG AAT G AGG AT AT AAAAAAG AT AT C CT CT CAAAAAAGAAAT AAGAAATT 
AC AAT T ACC AGCTGT AT CATCAAAAGATT GGAACTTGATT TTGGT CAAT CGTGACCAT AAACAT GAAGAATT AAGT CCAGATGT GG 
TTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGGATATGAGT ACTGT AGATT CTTTGAATGAGAGCGATCCTAGAGT AGT CAGT CAGTT GAAAAAGAT AGCT CCACAAT AT GGTT TT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAAC AT C AT TT AAC AT TAG AAG AAT AC AT AACTTT ATT AAAGG AG AAT AAC CAA 

SEQ ID NO. 2204: SAG0093 FROM THE 2603V/R GBS TYPE V STRAIN 

AC AG T CAACAAT CAT CAT CT C AAAAGT T GAG G AAT GAGG AT AT AAAAAAG AT AT C CT CT CAAAAAAGAAAT AAG AAAT T AC AAT T A 
CCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTGT 
TGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAACATT 
TAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTG 
AC GAG G G G AC AAG C AG AAAAGT T G GT AAAAACT T AC T CT C AG C C T G C AGG T G C T AGT G AAC AC C AG AC T G GAT TAG C GAT G GAT AT 
GAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTAC 
GGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 
AT GG C C AAAC AT CAT T T AAC AT TAG AAG AAT AC AT AACT T TAT T AAAG GAG AAT AAC C AAAAC C C AG C T T T C T T GT AC AA 

SEQ ID NO. 2205: SAG0093 FROM THE A909 GBS TYPE la STRAIN 

AAG C CT AAC AGT C AAC AAT C AT C AT CT C AAAAGT T GAGG AAT GAGG AT AT AAAAAAG AC AT CCTCT CAAAAAAGAAAT AAGAAATT 
ACGATT ACC AGCTGT AT C AT CAAAAGAT TGG AACT TGATTTTGGTCAATCGT GACCAT AAAC AT GAAGAATT AAGT CCAGATGTGG 
TGCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAAATGACTAGTAACCC 
T AAT T T G ACG AAG G AAC AAG C AG AAAAG T T G G T AAAAAC T T AC T C T C AG C CT G C AGGT G C T AGT G AAC AC C AG AC T G GAT TAG CG A 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAAC AT CAT T T AAC ATT AGAAG AAT AC AT AACTT T ATT AAAGG AG AAT AAC CAA 

SEQ ID NO. 2206: SAG0093 FROM THE CJBllO GBS NONTYPEABLE STRAIN 

AAGCCTAAC AGT CAACAAT CATC AT CT CAAAAGTT GAGGAATGAGGAT AT AAAAAAGAT AT CCTCT CAAAAAAGAAAT AAGAAATT 
TAC AAT T AC CAGCT GTAT CAT CAAAAGAT T GGAACTTGATT TTGGT CAAT CGT GACCAT AAAC AT GAAGAATT AAGT CC AGAT GTG 
GTTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACG 
AG AAC AT T T AAT T T C GGG T T AT C GT AGT G T T G C C TAT C AGG AG AAGT T GT T CAAT T C T T AT GT TAC T C AAG AG AT G AC TAG T AAC C 
CTAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCG 
ATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTT 
TGTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTG 
C AAAAT AT AT GGC C AAAC AT C AT TT AAC AT TAG AAG AAT AC AT AAC T T T AT T AAAGG AG AAT AAC CAA 

SEQ ID NO. 2207: SAG0093 FROM THE COH1 GBS TYPE III STRAIN 

CCT AAC AGT C AAC AAT C ATC AT CT C AAAAGT TG AGG AAT GAGG AT AT AAAAAAGAC AT CCT CTCAAAAAAGAAATT AAG AAAT TAC 
G ATT ACC AGCTGT AT CATC AAAAG ATTGG AACT TG ATT TTGGT CAAT CGT GACCAT AAAC AT GAAGAATT AAGT CC AG AT GTGGTG 
C C T G T T G AAAAT AT T TAT T T G GAT AAAC GTAT T ACG AAG C AAG C T AC T CAGT T T T TAG AG G C T G C TAG AG CAAT T GAT T C ACG AGA 
ACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTA 



105 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



ATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATG 
GAT AT GAGT AC T GT AG AT T CT T T GAAT GAGAG C GAT C CT AG AGT AGT C AGT C AGT T GAAAAAG AT AG CT C C AC AAT AT G GT T T T G T 
CTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAA 
AAT AT AT G GT C AAAC AT CAT T T AAC AT T AGAAG AAT AC AT AACT T T AT T AAAGG AGAAT AAC C AAAAC C C AG C T T T CT T GT AC AA 

SEQ ID NO. 2208: SAGOO 93 FROM THE H36b GBS TYPE lb STRAIN 

AAGCCT AAC AGT CAACAAT CAT C AT CT C AAAAGTT GAGGAATGAGGAT ATAAAAAAGACAT CCT CT CAAAAAAGAAAT AAGAAATT 
ACGATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGG 
TGCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAwGAAATGACTAGTAACCC 
TAATTTGACGAAGGAACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAAC AT CAT T T AAC AT T AG AAG AAT AC AT AAC T T T AT T AAAGGAGAAT AAC C AA 

SEQ ID NO. 2209: SAG0093 FROM THE JM9130013 GBS TYPE VIII STRAIN 

AAGC CT AAC AGT CAACAAT CAT CAT CT CAAAAGTT GAGGAAT GAGGAT AT AAAAAAGAT AT CCT CT CAAAAAAGAAAT AAGAAATT 
ACAATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGG 
TTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2210: SAG0093 FROM THE M732 GBS TYPE III STRAIN 

AGC CT AAC AGT CAACAAT CAT CAT CT C AAAAGT T GAGGAATGAGGAT AT AAAAAAG AC AT C CT CT CAAAAAAGAAAT AAGAAATT A 
C GAT T AC C AG CT GT AT CAT C AAAAG AT T G GAACT T GAT T T T G GT C AAT C G T G AC CAT AAAC AT G AAG AAT T AAGT C C AG AT GT GG T 
GCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAG 
AACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCT 
AATTTGACGAGGGGAC AAGC AG AAAAGT TGGTAAAAACTT ACT CTCAGCCTGCAGGTGCT AGT GAACACC AG ACT GG AT TAGCGAT 
GGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTG 
TCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCA 
AAATATATGGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAAAACCCAGCTTTCTT 

SEQ ID NO. 2211: SAG0093 FROM THE M781 GBS TYPE III STRAIN 

AAGCCTAACAGTC AAC AAT CAT CAT CT CAAAAGTT GAGGAAT GAGGAT AT AAAAAAGAC AT CCT CT CAAAAAAGAAAT AAG AAAT T 
AC GAT T AC C AG C T G TAT CAT C AAAAG AT T GGAACT T GAT T T T GG T C AAT C GTG AC CAT AAAC AT G AAG AAT T AAGT C C AG AT GT G G 
TGCCTGTT GAAAAT AT T TAT T T G GAT AAAC GT AT T AC G AAG C AAG C T AC T C AG T T T T TAG AG G C T G CT AG AG C AAT T GAT T C ACG A 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
T AAT T T G AC G AGGG G AC AAGC AG AAAAG T T GGT AAAAAC T T AC T CT C AG C C T G C AGGT G C T AGT G AAC AC C AG ACT G GAT TAG CG A 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG T C AAAC AT CAT T T AAC AT T AG AAGAAT AC AT AACT T T AT T AAAGG AG AAT AAC CAA 

>SEQ ID NO 2250: 18_090 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2251: 18_1169NT frame: 1 

KPNSQQS S PQKLRNEDIKKI S SQKRNKKLRLPAVS SKDWNLILVNRDHKHEELS PDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAEHRLTLEEYITLLKENNQ 

>SEQ ID NO 2252: 18_18RS21 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVE S AKYMAKHHLTLEE YI T LLKENNQ 

>SEQ ID NO 2253: 18_2603 frame: 3 
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SQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDWPVENI 
YLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRGQAE 
KLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGKTAE 
TGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQNPAFLY 

>SEQ ID NO 2254: 18__A909 frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTKE 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
T AETGVG YE DWHYRYVGVE S AKYMAKHHLT LEE Y I T LLKENNQ 

>SEQ ID NO 2255:18_CJB110 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKFTITSCIIKRLELDFGQS 

>SEQ ID NO 2256:18_COHl frame: 1 

PNSQQSSSQKLRNEDIKKTSSQKRN 

>SEQ ID NO 2257: 18_H36B frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTXEMTSNPNLTKE 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYE DWHYRYVGVE SAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2258: 18_JM9130013 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYE DWHYRYVG VES AKYMAKHHLTLEE YI TLLKENNQ 

>SEQ ID NO 2259:18_M732 frame: 3 

PNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDVVPVE 
NIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRGQ 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGKT 
AETGVGYEDWHYRYVGVESAKYMVKHHLTLEEYITLLKENNQNPAF 



>SEQ ID NO 2260: 18_M781 frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMVKHHLTLEEYITLLKENNQ 

SEQ ID NO. 2301: SAG0163 FROM THE 090 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTATGAACTCTATATGCGTATTGATGATGAAAGGC 
GGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAA 
AGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCG 
TGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGG 
AAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAA 
GTATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGA 
TATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAG 
CGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTTCCGGAGTCTAT 
GATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGG 
AAGCCTAATTGACTTTGAGACAGGTAAQTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAG 
GACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2302: SAG0163 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GGTGATTGTTATGAAACCTCTACTATTGCGTATTTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGT 
CTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTC 
AG AGG G AAG AC T GGT T T CAT T ACG AC T AT CG AG T GT GGG AG AT T AT C G T G GT C AAG AAT CT T T AGT TAT T C GT AT T T T GT AT T GAG 
G T CAT C AGG ACT T AAAAT AT T GG T T T GAT AAT AT AAAG C AAAT G AAG G AAGT AC T GG GT AC AAG AGG G CT AT AT CTTTTTTCCGGC 
C C T GT G GGG AGT GGT AAAAC AAC T CT C AT GT AT C AAT T AG C T T C AG AAGT AT T T AAAAAT AAG C AAAT T AT C AC GAT T G AAGAT C C 
GGT AGAAAT CAAGAATGAC AAGAT GTT ACAACT CCAATT GAATG AGGAT ATT GG AAT G ACT T AT GAT GCTTT AAT CAAACT GT CTT 
TACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCTCGTGCTGTTATTCGTGCAAGTTTAACGGGA 
GTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTT 
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AGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGTAACTTTAAAAAAC 
ACT CAT C AG AC AAGT G GAAT AGAC AAGT GG AT AT CT T GG CT G AAG AAGG AT AT AT C AGT AAG AAAC AGG C AC AAGT C G AAAAAAT T 
AT C C CT C AAGAAAC AACG G AAAGT AGT CC AAC T T T T 

SEQ ID NO. 2303: SAG0163 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

GT T C AAT CAT TAG C AAAG CAAGT CAT T CAT C AGG C AGT AG AAGT AAAT G C T C AAG AT AT T TAT AT CAT T C C C AAAGGT GAT T GT T A 
TGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTT 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AAC T C T CAT GT AT C AAT TAG C T T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C AC GAT T G AAG AT C CGGT AG AAAT C AAG AAT 
GAC AAG AT GT T AC AAC T C C AAT T GAAT GAGGAT AT T GG AAT GAC T TAT GAT G C T T T AAT C AAAC T GT C TT T AC GG CAT CGT C C AG A 
TAT T T T AAT TAT C GG AGAG AT T AGAG AT C AAG C G AC GGCCCGTGC T GT T AT T C G T G C AAGT T T AAC G G G AGT GAT GGT T T T T T CT A 
CTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
TTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTG 
GAAT AG AC AAG T G GAT AT CT T GG C T G AAG AAGG AC AT AT C AGT AAGAAAC AG G C AC AAGT C G AAAAAAT TAT C C CT C AAG AAAC AA 
C GG AAAGT AGT C C AAC T T T T 

SEQ ID NO. 2304: SAGO 163 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GATATTTATATCATTCCCAAAGGTGATTGTTATGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTT 
TAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTT 
GT GAC TAT G AACT GT C AG AG GGAAGAC T G GTT T CAT T ACGACT AT CG AGT G T GGG AGAT TAT CGT GGT C AAG AAT CT T T AGT TAT T 
CGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCT 
ATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTA 
T C AC GAT T G AAG AT C C G G TAG AAAT C AAGAAT G AC AAGAT GT T AC AAC T C C AAT T GAAT GAG GAT AT T G GAAT GAC T TAT GAT G CT 
TTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCG 
TGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGG 
TTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACA 
GGTAATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGC 
ACAAGTGCGAAAAAAT TAT CCCT C AAG AAACAACGGAAAGT AGT C C AACT TTT 

SEQ ID NO. 2305: SAG0163 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTA 
TGT^ACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AAT T T G T G G C AG G C AT G AAC GT T G GAG AAAAAAG AC G AAGT C AAT T AGGT T C T T GT GAC TAT G AAC T GT C AG AGGG AAG AC T G G T T 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AACT CT CAT GT AT C AAT T AG CT T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C ACG AT T GAAG AT C CG GT AG AAAT C AAG AAT 
GACAAGATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGA 
TATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTA 
CTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
TTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTG 
GAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAA 
CGGAAAGTAGTCC AACT TTT 

SEQ ID NO. 2306: SAG0163 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GTT C AAT CAT TAG C AAAG CAAGT CAT T CAT C AGG C AG TAG AAG T AAAT G C T C AAG AT AT T TAT AT CAT T C C C AAAGGT GAT T G T T A 
TGAi^CTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTT 
T CAT T AC GAC TAT C G AGT G T GG GAG AT TAT CGT G G T C AAG AAT C T T T AGT TAT T CG TAT T T T GT AT T C AGGT CAT C AG GAC T T AAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AACT CT CAT GT AT CAATTAGCT T CAGAAGT AT T T AAAAAT AAGCAAATT AT CACGATT GAAGAT CCGGTAGAAAT CAAGAAT 
GAC AAG AT GT T AC AACT C C AATT GAAT GAG GAT AT T G GAAT G ACT TAT GAT G C TT T AAT C AAAC T GT C T T T ACG G CAT C GT C C AG A 
TATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTA 
CTATTCATGCTAAAAGTATTTCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
T T AAT AG CAT AT C AAC G T T T AAT T G G AGG AGG AAG C C T AAT T G ACT T T G AG AC AG GT AACT T T AAAAAAC ACT CAT C AG AC AAG T G 
GAAT AGAC AAGT GGAT AT CT T GG C T GAAG AAGG AC AT AT C AGT AAG AAAC AGG C AC AAGT CG AAAAAAT TAT CCCT C AAG AAAC AA 
CGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2307: SAG0163 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AGGTGATTGTTATGAAATTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTT 
ATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGA 
GGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTC 
ATCAGGACTTAAAATATTGGTTTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCT 
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GTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGT 
AG AAAT C AAG AAT GAC AAG AT GT T AC AAC T C C AAT T G AAT G AGG AT AT T G G AAT G AC TT AT GAT G C T T TAAT C AAACT G T CT T T AC 
GGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTA 
ATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGA 
AAATAGTCT AAAATTAAT AGCAT AT CAACGT T TAAT TGGAGGAGGAAGCCT AATTGACTTTGAGACAAGTAACTTT AAAAAACACT 
CAT CAGACAAGT GGAAT AGACAAGTGGAT AT CT TGGCTGAAGAAGGAC ATATCAGT AAGAAACAGGCACAAGT CG AAAAAATT AT C 
C C T C AAG AAAC AAC GG AAAGT AGT C C AACT T T T 

SEQ ID NO. 2308: SAG0163 FROM THE H36b 6BS TYPE lb STRAIN (REVERSE COMPLEMENT) 

T CATT AGCAAAGCAAGT CATT C AT CAGGCAGT AGAAGTAAATGCT C AAGAT AT TT AT AT CATT CCCAAAGGTGATT GT T ATGAACT 
CTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTG 
TGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTTTCATTA 
CGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTG 
GTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAA 
CTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAG 
ATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTT 
AATTATCGGAGAGAAATAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGTTTTTTTCTACTATT 
C AT GCT AAAAGT ATT CCCGGAGTCTATGATAGGCTT AT AGAATTAGGGGTT AACT AT CAAGAGTT AG AAAAT AGT CT AAAATTAAT 
AGCAT ATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTGGAATA 
GACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAA 
AGTAGTCCAACTTTT 

SEQ ID NO. 2309: SAG0163 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GTTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTA 
TGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTT 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AACT C T C AT GT AT C AAT T AG CT T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C AC GAT T G AAG AT C C G G TAG AAAT C AAG AAT 
GAC AAG AT G T T AC AAC T C C AAT T GAAT GAG GAT AT T GGAAT GAC T TAT GAT G C T T TAAT C AAACT GT C T T T ACGG C AT C GT C C AG A 
TATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTA 
CTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
T TAAT AG CAT AT C AAC GT T TAAT T G G AGGAGG AAG C C TAAT T GAC T T T G AG AC AGGT AAT T T T AAAAAAC AC T CAT C AGAC AAG T G 
GAAT AG AC AAG TGG AT AT CT TGGCTGAAGAAGGAC AT ATC AGT AAG AAAC AGGC AC AAGT CG AAAAAATT AT CCCTCAAGAAACAA 
CGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2310: SAG0163 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

TGACTTGTTATGAAACTCTATATGCGTATTTGATGATGAAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTT 
ATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGA 
GGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTC 
ATCAGGACTTAAAATATTGGTTTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCT 
GT GG GG AGT G G T AAAAC AAC T C T CAT G TAT C AAT TAG C T T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C AC GAT T G AAG AT C C GGT 
AG AAAT C AAGAAT GAC AAGAT GT T AC AAC T C C AAT T GAAT GAGG AT AT T G GAAT GAC T T AT GAT G CT T TAAT C AAAC T GT C T T T AC 
GGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTA 
ATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGA 
AAAT AG T CT AAAAT TAAT AG CAT AT C AAC GT T T AAT T GG AGG AG G AAG C C T AAT T GAC T T T GAG AC AAGT AAC T T T AAAAAAC AC T 
CATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATC 
C CT C AAG AAAC AAC G G AAAG T AG T C C AAC T T T T 

SEQ ID NO. 2311: SAG0163 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

C AG TAG AAGT AAAT GCT CAAG AT AT T TAT AT CAT T C C C AAAG GT GAT T GT T AT GAAT T C TAT AT G C G T AT T GAT GAT G AAAG G CG G 
TTTATTGATGTTTTTGAGTTT AAT AGGATGGCT AGT CTTATTAGTCACTTT AAAT TTGTGGCAGGCATGAACGTTGGAGAAAAAAG 
ACG AAGT C AAT TAG GTTCTTGT GAC TAT G AAC T GT C AG AG GG AAG AC T G GT T T CAT T AC G ACT AT C AAGT G T GG G AG AT TAT C GT G 
GT CAAG AAT C T T T AGT TAT T C GT ACT T T GT AT T C AG GT CAT C AG G ACT T AAAAT AT T GGT T T GAT AAT AT AAAG C AAAT G AAGG AA 
GTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGT 
ATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGATA 
T T G GAAT GAC T TAT GAT G C T T TAAT C AAACT GT C T T T ACGG CAT C GT C C AG AT AT T T TAAT TAT C G GAG AG AT TAG AG AT CAAG C G 
ACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTAATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGA 
TAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAA 
GCCTAATTGACTTTGAGACAAGTAACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGA 
CAT AT C AGT AAG AAAC AGGC AC AAGT C G AAAAAAT T AT C C C T CAAG AAAC AAC G G AAAG T AGT C C AAC T T T T 

>SEQ ID NO 2350:63_090 frame: 2 

AVE VN AQ D I Y 1 1 P KG D C YE L YMR IDDERRFI D V FE FNRMAS L I S H FK FVAGMN VG E KRR S 
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QLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGTR 
GLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDAL 
IKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVYDRLIELGVNYQ 
E LEN S LKL I A Y QRL IGGGSLID FE T GN FKKH S S DKWNRQ V D I L AE E GH I S KKQ AQ VE K 1 1 
PQETTESSPTF 

>SEQ ID NO 2351: 63_1169NT frame: 3 

. LL . NLYYCVFDDERRFIDVFE FNRMASLISHFKFVAGMNVGEKRRSQLGSCDYELSEGR 
LVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGTRGLYLFSGPVGSGK 
TTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRHRPDILI 
IGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLKLIAYQR 
LIGGGS LI DFETSNFKKHSSDKWNRQVDILAEEGYI SKKQAQVEKI I PQETTESSPTF 

>SEQ ID NO 2352:63_18RS21 frame: X 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRI DDERRFI DVFEFNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
I KQMKE VL G I RG L Y L F S G P VG S GKT T LM YQL AS E V FKNKQ 1 1 T I E D P VE I KN DKML Q LQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2353: 63_2603 frame: 1 

DIYI I PKGDCYELYMRI DDERRFI DVFE FNRMAS LI SHFKFVAGMNVGEKRRSQLGSCDY 
ELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGIRGLYLFSG 
PVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRH 
RPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLK 
LIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHISKKQAQVRKNYPSRNNGK 
. SNF 

>SEQ ID NO 2354:63_A909 frame: 1 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRI DDERRFI DVFEFNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2355: 63_CJB110 frame: 1 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRIDDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGTRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2356: 63_CJB110 frame: 1 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRI DDERRFI DVFE FNRMAS LI S HFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGTRGLYL FS GPVGS GKTT LMYQL AS EVFKNKQ 1 1 T I E D PVE IKN DKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTESSPTF 

>SEQ ID NO 2357: 63_H36B frame: 1 

S L AKQVIHQAVE VNAQD I YI I PKGDC YE LYMRI D DERRFI DVFE FNRMAS L I SH FKFVAG 
MNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIK 
QMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNE 
DIGMTYDALIKLSLRHRPDILIIGEK 

>SEQ ID NO 2358 : 63_JM9130013 frame: 1 

VQSLAKQVIHQAVEVNAQD I YI I PKGDCYELYMRI DDERRFI DVFE FNRMAS LI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
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DRLIELGWYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2359:63_M732 frame: 3 

TCYETLYAYLMMKRRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQLGSCDYELSEGRL 
VSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIK . MKEVLCARGLYLFS G PVGSGKT 
TLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRHRPDILII 
GEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLKLIAYQRL 
IGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGHISKKQAQVEKIIPQETTESSPTF 

>SEQ ID NO 2360:63_M781 frame: 3 

VEVNAQDIYIIPKGDCYEFYMRIDDERRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQ 
LGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIKQMKEVLCARG 
LYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALI 
KLS LRHRP D I L 1 1 GE I RDQATARAVIRAS LT GVMVFS T I HAK S I PGVY DRL I E LGVN YQE 
LENSLKLIAYQRLIGGGSL I DFETSNFKKHSSDKWNRQVDILAEEGHI SKKQAQVEKI IP 

QETTESSPTF 

i 

>SEQ ID NO 2361 : 63_COHl frame: 3 

VI VMKFYMRI D DERRFI DV FE FNRMASL I SH FK WAGMNVGEKRRS QLGS CD YE LS EGRL 
VSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIK 

SEQ ID NO. 2401: SAG0290 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
AT T C AAAGGT T AT GAT GT T GAT GT TGT C AAAG C TGT T T T T AAAG G T AGT AAGT AC AAAGT AAC CT T C AAG AC AGT T C CT T T T GAT A 
C TAT T T C AAC AGGT AT T GAT G C AGGGAAAT T T GAT T T AT C AG CT AAT GAT T T T T C AT AC AAT AAAGAAAG AG C AG AAAAAT AT C T C 
T T CT C AGAC C C TAT AT C C C GT T C AAAT TAT G C CGT AGT AGGG AAG AAG GGG AG C C AT T AC AAAT CAT T AAGT G AC C T C T C T G G AAA 
AT C AAC AG AAGT T T T AT C T GG C GT T AAC TAT GC AC AGGT T C T AG AAAAT T G G AAT AAAAAT CAT C C T AAT AAAAAAC C AAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCTGACTATATTGTAAAAGATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ATT AG AAT ACCT CCT TTTACCAAAAGAT AAAAAAGGTAAAACT CTACAGAAATT TAT AAAT AAGCGT ATT AAAGT T T T GAAAGAAG 
AT GGT AC T T T GG C AC GT T T AAGT AAAC AAT AT T T C GG T G GAG AT T AC GT T T C AAAC AT T GAT AAA 

SEQ ID NO. 2402: SAG0290 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATRAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGT AGT AAGT ACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAAAGAGCAGAAAAATATCTC 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ACTAGAAT ACCT CCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTT AT AAAT AAGCGT ATT AAAGTTTTGAAAGAAA 
AT GGT AC T T T GG C AC G T T T AAGT AAAC AAT AT T T C GG T G G AG AT T AC GT T T C AAAC AT T GAT AAA 

SEQ ID NO. 2403: SAG0290 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CT AT T T C AAC AGGT AT T GAT G C AGG G AAAT T T GAT T TAT C AG C T AAT GAT T T T T CAT AC AAT AAAG AAAG AG C AG AAAAAT ATC T C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAG 

SEQ ID NO. 2404: SAG0290 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAAAGAGCAGAAAAATATCTC 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
AT T T CAT C C G ACT AT AT T GT AAAAG AC C AAT CAT T AAAC T T AAG CGT T TCTCCTTT G AAAG GT AAAAT T G GT AAT AAT AAGG AT G G 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAA 
AT GGT ACT TT GGCACGTTT AAGT AAAC AAT ATTTCGGTGGAGATTACGTTTC AAAC ATT GAT AAA 
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SEQ ID NO. 2405: SAG0290 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT ) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AGGT AT T G AT GC AGGGAAAT T T GAT T T AT C AG C T AAT GAT T T T T C AT AC AAT AAAG AAAGAGC AG AAAAAT AT CT C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATNNTAATAAAAAACCANTAAAAA 
TNAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
AT T T CAT C CG ACT AT AT T G T AAAAGAC C AAT CAT T AAAC T T AAG CGTTTCTCCTTT G AAAGGT AAAAT T GGT AAT AAT AAG GAT GG 
ACT AGAATACCTCCT T TT ACCAAAAGATAAAAAAGGT AAAACT CT ACAGAAAT TTAT AAAT AAGCGT 

SEQ ID NO. 2406: SAG0290 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AGGT AT T GAT G C AG GGAAAT T T GAT T TAT C AG C T AAT GAT T T T T CAT AC AAT AAAG AAAG AGC AG AAAAAT AT C T C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTT CAT CCGACT AT ATT GTAAAAGACCAATC ATT AAACTT AAGCGT TTCTCCTTTGAAAGGTAAAATTGGT AAT AAT AAGGATGG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAA 
ATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2407: SAG0290 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CT ATTT CAACAGGT AT T GATGCAGGGAAATT T GATTTAT C AGCT AATGAT TTT T CAT AT AAT AAAG AAAG AGC AG AAAAAT AT CT C 
T T CTCAGAT C CT AT AT CC CGTT CAAATT ATGCCGTAGT AGGGAAGAAGGGGAG C CAT T AC AAAT CATT AAGT GAC CT CT CT GGAAA 
AT CAACAGAAGT TTT AT CTGGCGT TAACT ATGC AC AGGTT CT AGAAAATT GG AAT AAAAAT C AT CCT AAT AAAAAAC CAAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGAAAAATTGACTTTATCCTATATGATGCC 
AT TT CAT CT GACTAT AT TGTAAAAGAT CAAT CATT AAACTT AAGCGTTT CT C CTTTGAAAGGTAAAAT TGGTAAT AAT AAGGATGG 
ATT AGAATACCT CCTT T T ACCAAAAGATAAAAAAGGT AAAACT CTACAGAAATTT AT AAAT AAGCGT ATT AAAGTTTT GAAAGAAG 
AT GGT ACT T T GG C ACG T T T AAGT AAAC AAT AT T T C G G T GG AG AT T AC GT T T C AAAC AT T GAT AAA 

SEQ ID NO. 2408: SAG0290 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AG GT AT TG AT G C AGGGAAAT T T GAT T T AT C AG C T AAT GAT T T T T CAT AC AAT AAAG AAAG AG C AG AAAAAT AT CT C 
T T C T C AGAT C CT AT AT CC C GT T C AAAT TAT G C C G T AGT AG GG AAG AAGGGG AG C C AT T AC AAAT CAT T AAG T GAC C T C T CT GGAAA 
AT C AAC C G AAG TTT TAT C T G GC GT TAACT AT G C AC AG GT T CT AG AAAAT T G G AAT AAAAAT CAT C C T AAT AAAAAAC CAAT AAAAA 
TC AAAT ATGTTTCTGGGACAACTGGTGTT ACT AGC AGAT TAAAAAAT AT TGAGAGTGGGAAAATTG ACT TTAT CCT ATATGATGCC 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAAT AAT AAGGATGG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAlGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAA 
AT GGT ACT T T GG C AC G T T T AAGT AAAC AAT AT T T C GGT G GAG AT T AC G T T T C AAAC AT T GAT AAA 

SEQ ID NO. 2409: SAG0290 FROM THE JM9130013 GBS STRAIN VIII (REVERSE COMPLEMENT) 

GT AT CAGTT CAGGCGT CAGAGAAAGT AGAACTT AAAGTAGCT ACAGATT CT GACACGGC ACCAT TT ACTT AT C AAAAAGACGGGAA 
AT T C AAAGGT TATG AT GTTG AT GTTGTCAAAGCTGTTTTT AAAGGT AGT AAGT AC AAAGTAACCTTCAAGACAGTT CCT TTTGAT A 
CT AT TT CAACAGGT AT TGAT GCAGGGAAATTTGAT TTAT CAGCT AAT GAT TT T T CAT AC AAT AAAG AAAG AGC AG AAAAAT AT CT C 
TT CTC AG AT CCT AT AT CCCGTT CAAATT AT GCCGT AGT AGGG AAG AAGGGG AGCC ATT AC AAAT CATT AAGT GACCTCTCTGGAAA 
AT CAACCG AAGT TT TAT CT GG CGTT AACTAT GC AC AGGT T CT AG AAAAT TGG AAT AAAAAT CAT CCTAATAAAAAACCAAT AAAAA 
T C AAAT AT GTTTCTGG GAC AAC T G G T GT T AC TAG C AG AT T AAAAAAT AT T GAG AGT GGG AAAAT T GAC T T T AT C C TAT AT GAT G C C 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ACT AG AAT ACCT C CTT TT ACCAAAAGATAAAAAAGGT AAAACT CT ACAGAAAT TTAT AAAT AAGCGT AAT AAAGTT TT GAAAGAAA 
AT GGT A 

SEQ ID NO. 2410: SAG0290 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GT AT C AGT T CAGGCGT CAGAGAAAGT AG AACT T AAAGT AG C TAG AG AT T C T GAC AC G G C AC CAT T T ACT TAT C AAAAAGAC GG G AA 
AT T C AAAG GT TAT G AC GT T GAT GT T GT C AAAG CTGTTTTT AAAG GT AG T AAGT AC AAAG T AAC CTT C AAG AC AG TTCCTTTT GAT A 
CT AT T T C AAC AG G T AT T GAT G C AG G G AAAT T T GAT T TAT C AG C T AAT GAT T T T T CAT AT AAT AAAG AAAG AG C AG AAAAAT AT CTC 
T T C T C AG AT C CT AT AT CCCGTT C AAAT TAT G C C GT AGT AG G G AAG AAG G G GAG C CAT T AC AAAT CAT T AAGT G AC CT C T CT GGAAA 
AT C AAC AGAAGTTT TAT CT GGCGTT AACTAT GC AC AGGT T CT AGAAAATT GGAAT AAAAAT CAT CCT AAT AAAAAAC CAAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGAAAAATTGACTTTATCCTATATGATGCC 
AT TT C AT CT GACT AT ATTGTAAAAG AT CAAT CATT AAACTT AAG CGTTTCT CCT TTGAAAGGTAAAATT GGT AAT AAT AAGGATGG 



112 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



ATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAG 
ATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2411: SAG0290 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GT AT C AGT T C AG GC GT C AGAGAAAGT AGAAC T T AAAGT AG C TAG AG AT T CTGAC AC G GC AC CAT T TACT TAT C AAAAAGACGGG AA 
ATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CT ATT TCAACAGGT ATTGATGC AGGGAAAT TT GATT TAT C AGC T AAT GAT T T T T C AT AT AAT AAAGAAAG AGC AG AAAAAT AT CT C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACAGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
T CAAAT AT GTTT CTGGGAC AACT GGTGTTACTAGCAGATTAAAAAAT ATT GAGAGTGGAAAAATT GACTTT AT C CT ATATGAT GCC 
ATTTCATCTGACTATATTGTAAAAGATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
AT T AGAATACCT CCTT TTACCAAAAGAT AAAAAAGGT AAAACT CT ACAGAAAT TT ATAAAT AAGCGT AT T AAAGT T TTGAAAGAAG 
AT G GT ACT T T G GC AC GT T T AAGT AAAC AAT AT T T CG GT GG AG AT T AC GT T T C AAAC AT T GAT AAA 

>SEQ ID NO 2450: 8JL169NT frame: 1 

VS VQAS EKVELKVATD SDTAP FT YQKDGKFKGYDVDVVKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2451:8_18RS21 frame: 1 

VSVQASEKVELKVATDSDTAPFTYXKDGKFKGYDVDVVKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2452 :8 — 2603 frame: 2 

FKGYDVDWKAVFKGSKYKVTFKTVPFDTISTGIDAGKFDLSANDFSYNKERAEKYLFSD 
PISRSNYAWGKKGSHYKSLSDLSGKSTEVLSGVNYAQVLENWNKNHPNKKPIKIKYVSG 
TTGVTSRLKNIESGKIDFILYDAISSDYIVKDQSLNLSVSPLKGKIGNNKDGLEYLLLPK 
DKK 

>SEQ ID NO 2453:8_090 frame: 1 

VS VQAS EKVE LKV AT D SDTAP FT YQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2454:8_A909 frame: 1 

VS VQAS EKVE LKVATD SDTAP FT YQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
S GVN YAQVLENWNKNHXNKK PXKXKYVS GTTGVT SRLKN IE S GKI D FI L YDAI S S D Y I VK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKR 

>SEQ ID NO 2455: 8_CJB110 frame: 1 

VS VQAS EKVE LKVATDSDTAP FT YQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAVVGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2456: 8_COHl frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2457:8_H36B frame: 1 

VS VQAS EKVE LKVATD SDTAP FT YQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 



113 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSN I DK 

>SEQ ID NO 2458:8__JM9130013 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWBCAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLICNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRNKVLKENG 

>SEQ ID NO 2459:8_M732 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSD^SGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2460:8_M781 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

SEQ ID NO. 2501: SAG0368 FROM THE 090 GBS TYPE la STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GG AG AAC AAG C AC T T GT T T AT T C T CG TAT G C G CT AT GAT GAT C C AG AG GG AG AT T AT GGG CGT C AAAAAAG AC AAC GT G AAG T AAT 
T C AAAAAGT C CT T AAAAAAAT AT T GG C GT T AAAT AGT AT T AG T T C AT AC AAAAAAAT TCTTTCCG C AGT AAG T AAT AAC AT G C AAA 
C T AAT AT T GAG AT AT CAT C AAAAAC GAT T C C T AAT T T GT T AG C T T AT AAAG AT T CAT T G G AAC AT AT T AAAT CT TAT C AGT T G AAG 
GGTGAAGACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAA 
AG AACT G G AT AAAAAG CGT AGT AAAAC T C T GAAG AC AAG C G C GAT T C TAT AT GAAGAT T AC T AT GGT AC TAG T G CT AGT AAT GAT T 
CTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2502: SAG0368 FROM THE 1169NT1 GBS TYPE V STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTTGGTCAGGAAATAGCGATTCTATGATC 
TTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAA 
TAATGGACAGACTGGCGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACT 
TATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTA 
ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAA 
TGGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAA 
TTCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAA 
ACT AAT AT T GAG AT AT CAT C AAAAAC GAT T C C T AAT T T G T T AGC T TAT AAAG AT T CAT T G G AAC AT AT T AAAT C T TAT C AGT T G AA 
AGG T GAAG AC G CT AC T T T AT C AG AT G GT G G CT CT T AT C AAAT T T T AAC T AAG AAAC AT C T AC T T G C AGT T C AAAAT AG AAT T AAG A 
AAGAACTAGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGAT 
TCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTAC 
T TAT AGT T C T GAG ACT AAT C AAAC AAC T CAT C AAAGT T ACT AT AAT AGT AG C ACT C C T G C T AAT AACT AT AG C AGT AAC ACT AAC A 
CAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAATGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2503 SAG0368 FROM THE 18RS21 GBS TYPE II STRAIN 

TAT AAT T T T T C G AC T AAT G AAT T G T CT AAG AC T T T T AAAG AT T T T AAG CT AG CT AAAT C AAAAAGT CAT G C TAT T GAAG AAAC AAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
T AGT C AC TAT AAAT C C TAAAACT AAT AAAAC AAC GAT G AC AAGCT T AG AAC GT G AC G T AT T G AT T AAAT T G AGT GGT C C C AAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
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T CAAAAAGT C C T T AAAAAAAT AT T G GCGT T AAAT AGT AT T AGT T C AT ACAAAAAAAT TCTTTCCG C AGT AAGT AAT AAC AT GC AAA 
CT AAT AT T G AGAT AT CAT C AAAAAC GAT T C C T AAT T T GT T AG CT T AT AAAG AT T CAT T GGAAC AT AT T AAAT CT T AT C AGT T G AAG 
GGT GAAGACGCT ACTTT AT CAGATGGTGGCT CTT AT CAAATT T T AACT AAGAAACAT CT ACTTGC AGTT CAAAATAGAATTAAGAA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CTT CT AC T TAT T CAT CAAC AC AAG AG AAT AAT T AT AAT AC AAC AC CT T AT T C AGAAGC ACC AC C AAGT T AC AGT G GT AAT ACT AC T 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2504: SAG0368 FROM THE 2603 V/R GBS TYPE V STRAIN 

TAT AAT TT T T CGACTAAT GAATTGTCT AAGACTTTTAAAGATT T TAAGCTAGCT AAAT C AAAAAGT CAT GCTATT GAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
T AGT C AC TAT AAAT C C T AAAAC T AAT AAAACAACG AT G AC AAG C T T AGAAC GT G AC GT AT T GAT T AAAT T G AGT GGT C C C AAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CT AAT AAAT T T GACT T T C C AAT AT C AAT T G C T G C C AATGAACC AG AGT AC AAG GCTGTTGTT G AAC C AGG GAC AC AT AAAAT AAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
T CAAAAAGT C CTT AAAAAAAT ATTGGCGTT AAAT AGTATT AGTT CAT ACAAAAAAATT CTTT C CGCAGTAAGTAAT AAC ATGCAAA 
CT AAT AT T G AGAT AT CAT C AAAAAC GAT T C C T AAT T T GT T AG CT TAT AAAG AT T CAT T GGAAC AT AT T AAAT CT T AT C AGT T G AAG 
GGT GAAGACGCT ACTTTAT CAGATGGTGGCT CTTAT CAAATT TTAACTAAGAAAC AT CT ACTTGC AGTTC AAAAT AGAATTAAGAA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2505: SAG0368 FROM THE A909 GBS TYPE la STRAIN 

TAT AAT T TT T CGACTAATGAAT TGT CTAAGACTT TT AAAG ATTTTAAGCTAGCTAAAT CAAAAAGT CATGCTATTGAAGAAAC AAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
T AGT C AC TAT AAAT C C T AAAAC T AAT AAAAC AAC GAT GAC AAG C T TAG AAC GT GAC GT AT T GAT T AAAT T GAGT G GT C C C AAAAAT 
AAT GGACAGACT GGAG TAG AAGCAAAGC T AAAT G C AG C C TAT GCTTCTGGT GGT G C GG AAAT G G C ATT GAT G ACT GT T C AAG ACT T 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CT AAT AAAT T T G ACT T T C C AAT AT C AAT T G CT G C C AAT G AAC C AGAGT AC AAG GCTGTTGTT GAAC C AGG GAC AC AT AAAAT AAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
TCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAA 
CT AAT AT T GAG AT AT CAT C AAAAAC GAT T C C T AAT T T GT T AG CT TAT AAAG AT T CAT T GGAAC AT AT T AAAT CT T AT CAGT T GAAG 
GGT G AAG AC G C TACT T TAT C AG AT GGTGGCTCT TAT C AAAT T T T AAC T AAGAAACAT CT AC T T G C AG T T C AAAAT AGAAT T AAG AA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGT C AG G C T GAT T C AAG T G G AAGT GT C AAT AAT CAT AAC GGG G CT G C AAC G C C T AAT C C A 

SEQ ID NO. 2506: SAG0368 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

TAT AATTTTT CGACTAATGAATT GT CTAAGACTT TTAAAGATTTTAAGCTAGCTAAAT CAAAAAGT CAT GCTAT T GAAGAAACAAA 
GCCGTTTT C AAT ACT AT T AAT G G G G GT GG AC AC AG GT T C AG AG CAT C G AAAAT C T AAGT GGT C AGG AAAT AGCG AT T CT AT GAT CT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GG AG AAC AAGC AC T T GT T TAT T CT C GT AT G C G CT AT GAT GAT C C AG AG GGAG AT T AT GGG C GT C AAAAAAG AC AAC GT G AAGT AAT 
TCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAA 
CTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAG 
GGT GAAG ACG C TACT T TAT C AG AT GGTGGCTCT TAT C AAAT T TT AAC T AAG AAAC AT C T ACT T GC AGT T C AAAAT AGAAT T AAG AA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACT 
TATTAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACA 
CAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2507: SAG0368 FROM THE COHl GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GATTTTAAGCTAGATAAATCAAAAAGTCATGCTATTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGTGTGGACACAGGTTC 
AGAGC AT CGAAAATCT AAGTGGT CAGGAAATAGCGAT TCTATGAT CT T AGT C ACT AT AAAT CCTAAAACT AAT AAAAC AACGATGA 
CAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGCGTAGAAGCAAAGCTAAATGCAGCC 
TATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATAT 
GCAAGGATTAGTTGATTTGGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATG 
AACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGAT 
GATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTAT 
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T AGT T CAT AC AAAAAAAT TCTTTCCG C AGT AAGT AAT AACAT G C AAAC T AAT AT T GAG AT AT CAT C AAAAAC GAT T C C T AAT T T GT 
T AG CT T AT AAAGAT T CAT T GG AAC AT AT T AAAT CT T AT C AGT T GAAGG G T GAAG AC G C T AC T C TAT C AG AT GGTGGCTCT TAT C AA 
AT T T T AACT AAGAAAC AT C T ACT T G C AGT T CAAAAT AG AAT T AAGAAAG AG CT GG AT AAAAAGC GT AGT AAAAC T CT G AAG AC AAG 
CGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATTATTATTATA 
CAACACCCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAGTTA 
CTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTTAATAATTATA 
ACGGGGCTG C AAC G C C T AAT C C AAAC AC AG GAACG C AAC C AGT AC C AGGT C AAACT AAT C C A 

SEQ ID NO. 2508: SAG0368 FROM THE H36b GBS TYPE lb STRAIN 

TAT AAT T T T T C GAC T AAT G AAT T GT C T AAG AC T T T T AAAGAT T T TAAGC TAG CT AAAT C AAAAAGT CAT G C T AT T GAAG AAAC AAA 
GCCGTTTT C AAT ACT AT T AAT G G GGGT GG AC AC AGGT T C AG AG CAT C G AAAAT CT AAGT GG T C AG G AAAT AG C GAT T CT AT GAT CT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 

AT T AGAT AT T AAT GT T GAT TAG T T TAT GC AAAT T AAT AT G C AAG G AT T AGT T GAT T T AGT C AAT G C T GT T GGT G GT AT AAC AGT AA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
T C AAAAAGT C C T T AAAAAAAT AT T G G C GT T AAAT AGT A 

SEQ ID NO. 2509: SAG0368 FROM THE 

T T AGT T CAT AC AAAAAAAT T CT T T CCGC AGT AAGT AAT AAC AT GC AAACT AAT AT T GAG AT AT C AT C AAAAACG AT T CCT AAT T T G 
TTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTACTTTATCAGATGGTGGCTCTTATCA 
AATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAA 

GC GC G AT T C TAT AT GAAG AT TACT AT G GT AC T AC T GC T AGT AAT GAT T CT T CT ACT TAT T CAT C AAC AC AAGAG AAT AAT TAT AAT 
ACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTA 
CTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATA 
ACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2510: SAG0368 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT ) 

T AT AATTTTT CG ACT AAT GAATTGTCT AAG ACTTTT AAAGAT TTTAAGCTAGCT AAAT CAAAAAGTC AT GCT AT TG AAG AAAC AAA 
GCCGTTTT C AAT AC TAT T AAT GGG G GT GG AC AC AGGT T C AGAG CAT C G AAAAT CT AAGT GGT C AG G AAAT AG CG AT T C T AT GAT C T 
T AGT C ACT AT AAAT CCT AAAACT AAT AAAAC AACG AT G AC AAG CT T AG AAC GT G AC GT AT T G AT T AAAT TG AGT G GT C C C AAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
AT TAG AT AT T AAT GT T GAT TACT T T AT G C AAAT T AAT AT G C AAGG AT T AGT T GAT T T AGT C AAT GCTGTTGG T GGT AT AAC AG T AA 
C T AAT AAAT T T G ACT T T C C AAT AT C AAT T G C T G C C AAT GAAC C AG AGT AC AAGG C T G T T GT T GAAC C AG G GAC AC AT AAAAT AAAT 
GGAGAACAAgCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
T C AAAAAGT CCT T AAAAAAAT AT T GG CGT T AAAT AGT AT T AGT T CAT AC AAAAAAAT TCTTTCCG C AGT AAG T AAT AAC AT G C AAA 
CTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAG 
GGTGAAGACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAA 
AG AACT GGAT AAAAAGC GT AGT AAAAC T CT GAAG AC AAG CG C GAT T C TAT AT GAAG AT TAG TAT G GT AC T AC T G CT AGT AAT GAT T 
CTT CT ACTT AT T CAT CAAC AC AAG AG AAT AAT TAT AAT AC AAC AC CT TAT T CAG AAGC ACCACCAAGTT AC AGTGGT AAT ACT ACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2511: SAG0368 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

T T C AAT ACT AT T AAT GG GT GT GG AC AC AGGT T C AGAG CAT C G AAAAT C T AAGT GG T CAG G AAAT AGC GAT T C TAT GAT C T T AGT C A 
CTAT AAAT CCT AAAACTAAT AAAAC AACGATGACAAGCTTAGAACGTGACGT ATT GATTAAATT GAGTGGTCCCAAAAAT AAT GGA 
CAGACTGGCGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGA 
TATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTGGTCAATGCTGTTGGTGGTATAACAGTAACTAATA 
AATTT G ACTTT C C AAT AT C AAT T GCT GCC AAT G AACC AGAGT AC AAGGCT GT T GT T GAAC C AGGGAC AC AT AAAAT AAAT GGAG AA 
CAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
AGTCCTT AAAAAAAT ATTGGCGTTAAAT AGT ATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAACTAATA 
T T GAG AT AT CAT C AAAAAC GAT T C C T AAT T T G T T AG C T TAT AAAGAT T CAT T G GAAC AT AT T AAAT C T TAT C AGT T GAAG GGT GAA 
GAC GC TACT C T AT CAG AT GG T GG C T C T TAT C AAAT T T T AAC T AAG AAAC AT C T ACT T G C AG T T CAAAAT AG AAT T AAG AAAG AG C T 
G G AT AAAAAG C G TAG T AAAAC T C T GAAG AC AAG C G C GAT T C TAT AT GAAG AT T AC TAT GG TACT AC T G C TAG T AAT GAT T C T T CT A 
CT TAT T CAT CAAC AC AAG AG AAT AAT TAT AAT AC AAC AC CT TAT T CAG AAG C AC C AC C AAG T T AC AG T GGT AAT AC T AC T TAT AG T 
TCTGAGACTAATCAAACAACTCATCAAAGTTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACACAGGTCA 
GGCTGATTCAAGTGGAAGTGTTAATAATTATAACGGGGCTGCAACGCCTAATCCAAACACAGGAACGCAACCAGTACCAGGTCAAA 

CTAATCCA 

>SEQ ID NO 2550: 54_090 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 
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NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2551:54_1169NT frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKLVRK. RFYDLSH 
YKS . N . . NNDDKLRT . RID . IEWSQK . WTDWRRSKAKCSLCFWWCGNGIDDCSRLIRY . C 
. LLYAN . YARIS . FSQCCWWYNSN . . I . LSNINCCQ . TRVQGCC . TRDT . NKWRTSTCLF 
SYAL. . SRGRLWASKKTT . SNSKSP . KNIGVK. Y. FIQKNSFRSK. . HAN . Y . DIIKNDS 
. FVSL . RFIGTY . ILSVER . RRYFIRWWLLSNFN . ETSTCSSK . N . ERTR . KA. . NSEDK 
RDS I . RLLWYYC . . . FFYLFINTRE . L . YNTLFRSTTKLQW . YYL . F. D . SNNSSKLL . . 
.HSC. . L.Q.H.HRSG. FKWKCQ . S . WGCNA . S 

>SEQ ID NO 2552:54_18RS21 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLWAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I S S YKKILSAVSNNMQTNIE I SSKT I P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2553 :54_2 603 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
IN PKTNKTTMT S LE RD VL IKLS GPKNNGQTGVE AKLNAAYAS GG AEMALMT VQD LL D IN V 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNSISSYKKILSAVSNNMQTNIEISSKTIP 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2554: 54_A909 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2555:54_CJB110 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I S S YKKILSAVSNNMQTNIE I S SKT I P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTY. F . D . SNNSSKLL . . 

>SEQ ID NO 2556:54_COHl frame: 1 

DFKLDKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVTINPKTNKTTMTSL 
ERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINVDYFMQINMQGLVD 
LVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYSRMRYDDPEGDYGR 
QKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIPNLLAYKDSLEHIK 
SYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTAS 
NDSST YS STQENYYYTTPLFRSTTKLQW . YYL . F . D . SNNS SKLL ...HSC..L.Q.H.H 
RSG . FKWKC . . L. RGCNA . SKHRNATSTRSN . S 

>SEQ ID NO 2557:54_H36B frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
IN PKTNKTTMT SLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I SS YKKILSAVSNNMQTNIE I SSKTIP 
N LL AYKD S LE H I KS Y Q LKG E DAT L S D GG S YQ I LT KKHLL AVQN R I KKE L DKKRS KT LKT S 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
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STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 
>SEQ ID NO 2558:54_JM9130013 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQRE VI QKVLKKI LALNS I S S YKKILS AVSNNMQTN IE I S SKTI P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2559:54_M781 frame: 2 

SILLMGVDTGSEHRKSKWSGNSDSMILVT INPKTNKTTMTSLERDVLIKLSGPKNNGQTG 
VEAKLNAAYASGGAEMALMTVQDLLDINVDYFMQINMQGLVDLVNAVGGITVTNKFDFPI 
SIAANEPEYKAWEPGTHKINGEQALVYSRMRYDDPEGDYGRQKRQREVIQKVLKKILAL 
NSISSYKKILSAVSNNMQTNIEISSKTIPNLLAYKDSLEHIKSYQLKGEDATLSDGGSYQ 
ILTKKHLLAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTASNDSSTYSSTQENNYNTTP 
YSEAPPSYSGNTTYSSETNQTTHQSYYNSSTPASNYSSNTNTGQADSSGSVNNYNGAATP 
NPNTGTQPVPGQTNP 

SEQ ID NO. 2601: SAG0503 FROM THE 090 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

GGGCACAAGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCC 
TAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGT 
TTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAG 
T C AAC AAAT T T T AAAAC G TAT GAG GAC AGAT C C T C AAAT CG AAAAAGAT T T AG AG AAAG C T GAT T TAT T G AC G C T AAC TGTTGGTG 
GTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAA 
CGTTTGAAAGAAATACTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCT 
AAACTT T CCACAATTAACTAAAATGCAAAC CGTT ATT GAT AAT TGGAAT AAAGCT ACAAAAGAAGT AGT TGAT GCTT CAGAAAAT G 
TTTATTTTGTCCCAATTAATGACCGCCTTT AT AAGGGAAT AAAT GGT AAAG AGGGT ATT AC AGAGT CATC AAAT AGTCAGGCAAGT 
ATCACTAATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAA 
AAT AAAT GAAAC AAG AAAAAAC T GG C C G AACC C AG C T T T CT T G T ACAAAG 

SEQ ID NO. 2602: SAG0503 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

T T T GT AC AAAAAAG C AGG C T CT AT TTTTTCCTT GAT CAT T C C AAAAT C AAAT C C T AAAT T AAC AAAAAAAG AC T T C C T AAC AAAG A 
AAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAiiGGTGGTTTTGTTCCA 
CT G C T AT C AG AAT C ACT C CAT AAT C GAT AC T CT T AC C AAGT G AC T T C T G T T AAT TAT GGT G T GT CT GG G AAT AC T AGT C AAC AAAT 
T T T AAAACG T AT G ACG AC AG AT C C T C AAAT C G AAAAAGAT T TAG AG AAAG C T GAT T T AT T GAC G CT AACT GT T GG T G GT AAT GAT G 
T C T T G G C T GT T AT T C GT AAAG AG C T C AGT CAT T TAT C AC T AAAT T C C T T T G AG AAAC C AG C AG AAGC AT AT AAGG AAC G T T T G AAA 
GAAAT CCT T GCAAAAGC AAG AC AAG AT AAT CCT AAATTGCCT AT TT AT GT TT T AGG C ATTT AT AAT C CTTTTT ACCT AAACT T T CC 
ACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTG 
TCCCAATTAATGACCGCCTTTAT AAGGGAAT AAATGGTAAAGAGGGTATTATAGAGTCATCAAAT AGTCAGGCAAGT ATCACT AAT 
GAT G C T CT C T TT ACT G GAG AC CAT T T T CAT C C C AAT AAT AT T G G CT AT C AAAT CAT G T CT AAC GC C G T T AT G G AGAAAAT AAAT GA 
AACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGGTCC 

SEQ ID NO. 2603: SAG0503 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

GTT T GTACAAAAAAGCAGG CT CTATTTTTT CCT T GAT CATT CCAAAAT C AAAT CCT AAAT T AAC AAAAAAAG ACT T C CTAACAAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
AC T G C TAT C AG AAT C AC T C CAT AAT C GAT AC T C T T AC C AAGT G ACT T C T GT T AAT TAT GGTGTGTCT GGG AAT AC T AG T C AAC AAA 
T T T T AAAACGT AT G ACG AC AG AT CCT C AAAT CG AAAAAGAT T T AG AG AAAG CT GAT T T AT T G ACG CT AAC TG T T G GT GGT AAT GAT 
GT C T T GGCT GT TAT T C GT AAAG AG C T C AGT CAT T T AT C ACT AAAT T C CT T T GAG AAAC C AG C AG AAG C AT AT AAGG AACGT T T G AA 
AGAAAT CCT T GCAAAAGC AAG AC AAG AT AAT CCT AAATTGCCT ATT TATGT TTTAGGCAT T TAT AAT CCT T TT T ACCT AAACT TT C 
C AC AAT T AAC T AAAAT G C AAAC C GT TAT T GAT AAT T GG AAT AAAG C T AC AAAAG AAGT AG T T GAT GCTT C AG AAAAT GT T TAT T T T 
GT C C C AATT AAT G ACC GCCT T T AT AAGGGAAT AAAT GGT AAAG AGGGT AT T AC AG AGT CAT C AAAT AGT CAGGCAAGT AT CACTAA 
T GAT GCTCTCTT TACT GG AG AC CAT T T T CAT C C C AAT AAT AT T G G C TAT C AAAT CAT GT C T AAC G C C GT TAT GG AG AAAAT AAAT G 
AAAC AAG AAAAAACT G G C C G AAC C C AGCT T T CTT GT AC AA 

SEQ ID NO. 2604: SAGO 503 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GGACAAGTTTGT AC AAAAAAGC AGGCT CTATTTTTT CCT TGAT C ATT CCAAAAT C AAAT CCTAAATTAACAAAAAAAGACTT CCT A 
AC AAAG AAAGT T AT C C C ACT T AACT AT GT T G CT CTT G GAG AT T CT CT GAC C G AAG GT GT G GGG GAT AC AAC CT CT C AAG GT G GT TT 
T GT C C C AC T G C TAT C AG AAT C AC T C CAT AAT C GAT AC T C T T AC C AAGT GAC T T C T GT T AAT TAT GGT GT G T CT G GG AAT AC TAG T C 
AACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGT 
AATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACG 
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T T T G AAAG AAAT T C T T G C AAAAGC AAGAC AAGAT AAT CCT AAAT T GCC T AT T TAT GT T T T AG G CAT T T AT AAT CC T T T T TAG C T AA 
ACTTTCCACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTT 
TAT T T T GT C C C AAT T AAT G AC C GC CT T T AT AAG G GAAT AAAT G GT AAAGAGGGT AT T AC AGAGT C AT C AAAT AGT C AGG C AAGT AT 
CACTAATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAA 
T AAAT GAAAC AAG AAAAAAC T G GC CGAAC C C AG CT T T CT T GT AC AAA 

SEQ ID NO. 2605: SAG0503 PROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GTTT GT AC AAAAAAGC AGG CT CT ATTTTTTCCTTG AT C ATT CCAAAATC AAAT CCT AAAT TAACAAAAAAAG ACT T CCT AAC AAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTCCC 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTCAACAAA 
TTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGAT 
GTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAA 
AG AAAT AC T T G C AAAAGC AAG AC AAG AT AAT C C T AAAT T G C CT AT T TAT GT T T T AGG CAT T TAT AAT C CT T T T T AC C T AAACT T T C 
C AC AAT T AAC T AAAAT G C AAAC C GT TAT T GAT AAT T G G AAT AAAGC T AC AAAAG AAGT AGT T GAT GC T T C AGAAAAT GT T TAT T T T 
GTCCCAATT AAT GACCGCCTTT AT AAGGGAAT AAAT GGT AAAGAGGGT ATT AC AGAGTCATCAAAT AGT C AGGCAAGT AT CACTAA 
TGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAAC AAG AAAAAAC T GGC CGAAC C C AG CT T T CT T GT AC AA 

SEQ ID NO. 2606: SAG0503 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GTT T GT AC AAAAAAGC AGG CT CT AT TTTTTCCTTGATCATTCC AAAAT C AAAT CCT AAAT T AAC AAAAAAAGACTT CCT AAC AAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAACCTCTCAAGGTGGTTTTGTCCC 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTCAACAAA 
TTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGAT 
GTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAA 
AGAAATTCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTC 
CACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTT 
G T C CC AAT T AAT G AC CG C C T T TAT AAG GGAAT AAAT GGT AAAG AGGGT AT T AC AG AGT CAT C AAAT AGT C AGGCAAGT AT C AC T AA 
TGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ ID NO. 2607: SAG0503 FROM THE JM9130013 GBS TYPE VIII STRAIN 
(REVERSE COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTAACAAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
ACT G C TAT C AG AAT C ACT C CAT AAT C GAT AC T C T T AC C AAGT G AC T T CT GT T AAT TAT GGT GTGTCTGG GAAT AC T AGT C AAC AAA 
TTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGAT 
GTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAA 
AGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTC 
CACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTT 
GT C C C AAT T AAT G AC C G C CT T TAT AAGGGAAT AAAT G GT AAAG AGG GT AT T AC AGAGT CAT C AAAT AGT C AGGCAAGT AT C AC T AA 
T GAT GCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGG AG AAAAT AAATG 
AAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ ID NO. 2608: SAG0503 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTAACAAA 
GAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTC 
CACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTCAACAA 
AT T T T AAAAC GT AT G ACG AC AGAT CCT C AAAT C G AAAAAG AT T TAG AG AAAG CT GAT T T AT T G AC G C T AAC T GT T GGT G GT AAT G A 
TGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGA 
AAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTT 
CCACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTT 
TGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTA 
ATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAAT 
GAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGG 

SEQ ID NO. 2609: SAG0503 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GGACAAGTTTGTACAAAAAAGCAGGCTCTA'TTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTA 
AC AAAG AAAGT TAT C C C AC T T AAC TAT G T T G CT C T T GG AG AT T CT C T G AC C G AAGGT GT GG G G GAT AC AAC CT CT C AAGGT GGT TT 
TGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTC 
AAC AAAT T T T AAAAC GT AT G AC G AC AG AT CCT C AAAT CG AAAAAG AT T TAGAG AAAGC T GAT T TAT T G AC G C T AAC TGTTGGTGGT 
AATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACG 
TTTGAAAGAAATTCTTGC AAAAGC AAGAC AAGAT AAT CCT AAAT TGCCT ATT TATGTTTTAGGCATT TAT AAT CCTTTTTACCTAA 
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ACTTTCCACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTT 
TAT T T T GT C C C AAT T AAT GACC G CCT TT AT AAGG G AAT AAAT G GT AAAGAG GGT AT T AC AGAGT CAT C AAAT AGT C AG GC AAGT AT 
CACTAATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAA 
TAAATGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

>SEQ ID NO 2650:103_090 frame: 2 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVP 

LLSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLA 

VIRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKM 

QTVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDH 

FHPNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2651:103_H36B frame: 2 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGIIESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2652:103_18RS21 frame: 3 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2653:103_COH1 frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPL 

LSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAV 

IRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQ 

TVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHF 

HPNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2654:103_CJB110 frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2655:103_1169NT frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2656:103_JM9130013 frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEIL7VKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2657 :103_2 603 frame: 1 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLL 

SESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVI 

RKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQT 

VIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFH 

PNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2658:103_M781 frame: 3 

IFSLII PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPL 

L S E S LHNR Y S YQVT S VN YGV S GNT S QQ I LKRMTT D PQ I EKD LE KADLLT LT VGGN DVLAV 
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IRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQ 
TVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHF 
HPNNIGYQIMSNAVMEKINETRKNWP 

SEQ ID NO. 2701: SAG1473 FROM THE 1169NT1 6BS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

GAT AC AAG T GAT AAG AAT AC T G AC AC GAGT GT C GT GAG T AC G AC C T T AT C T G AGGAGAAAAGAT C AG AT G AAC TAG AC C AGT CT AG 
TACTGGTT CTT CTTCTGAAAATGAAT CGAGTT CATC AAGTGAACC AGAAACAAATCCGTCAACT AAT CCAC CTACAACAGAAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAAT7UVGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AGAAGAT AGT AT T AAGAAT T T TAG T AAAG C AAGT AG T GAT C AAG AAG AAGT GG AT C GC G AT G AAT CAT CAT C 
TTCAAAAGCAAGTGATGGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2702: SAGX473 FROM THE 18RS21 GBS TYPE II STRAIN 

GAT AC AAGT GAT AAG AAT AC T GAC AC G AGT GT CGT G AC T ACGAC C T TAT CT G AGGAGAAAAGAT C AGAT G AAC T AGAC C AGT CT AG 
T ACT GGT TCTTCTTCT G AAAAT GAAT CGAGT T CAT C AAGT G AAC C AG AAAC AAAT C CGT C AAC T AAT C C AC C T AC AAC AGAAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAAGCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATC 
T T C AAAAGC AAAT GAT GG GAAAAAAGG C C AC AGT AAGC CT AAAAAG GAA 

SEQ ID NO. 2703: SAG1473 FROM THE 2603 V/R GBS TYPE V STRAIN 

GAT AC AAGT GAT AAG AAT ACT GAC ACG AGT GT C GT GAC T AC GAC CT T AT CT G AGGAGAAAAGAT C AG AT G AACT AGAC C AGT C TAG 
TACT GGT T CTTCTTCT G AAAAT GAAT C G AGT T CAT C AAGT G AAC C AG AAAC AAAT CCGT C AAC T AAT C C AC C TAG AAC AG AAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT TAATTT C AGAAGAT AGT ATT AAG AAT TTTAGTAAAGC AAGT AGT GAT CAAGAAGAAGTGGAT CGCGAT GAAT CAT CAT C 
T T C AAAAG C AAAT G AT GG GAAAAAAGGC C AC AG T AAG C CT AAAAAG GAA 

SEQ ID NO. 2704: SAG1473 FROM THE 090 GBS TYPE la STRAIN 

GACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACCAGAAACAAATCCGTCAACTAATCCACCTAC 
AAC AGAAC CAT C G C AAC C C T C AC CT AGT G AAG AG AAC AAG CC T GAT GGT AG AAC GAAG AC AGAAAT T GG C AAT AAT AAGG AT AT T T 
CTAGTGGAACAAAAGT AT TAATTT C AGAAGAT AGT AT TAAGAATTT T AGT AAAGC AAGT AGT GAT CAAGAAGAAGTGGAT CGCGAT 
GAAT CAT CAT CTT C AAAAG C AAAT G AT GGG AAAAAAG G C C AC AGT AAGC CT AAAAAG GAA 

SEQ ID NO. 2705: SAG1473 FROM THE A909 GBS TYPE la STRAIN 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGAACTAGACCAGTCTAG 
TACT GGT T CTTCTTCT G AAAAT GAAT CGAGT T CAT C AAGT G AAC C AG AAAC AAAT C C C T C AAC T AAT C C AC C T AC AAC AG AAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAAGCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATC 
T T C AAAAG C AAAT GAT G AG AAAAAAGGC C AC AG T AAG C CT AAAAAG GAA 

SEQ ID NO. 2706: SAG1473 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GAT AC AAG T GAT AAG AAT AC T GAC AC G AGT GT CGT GAC T AC G AC CT TAT C T GAG G AGAAAAGAT C AG AT G AAC T AGAC C AGT C T AG 
TACTGGTT CTT CTT CTGAAAAT GAAT CGAGT T CAT CAAGT GAACCAGAAAC AAAT CCGT CAACT AAT C C AC CT AC AAC AG AAC CAT 
C G C AAC C CT C AC C T AGT GAAG AG AAC AAG C C T GAT GGT AG AAC GAAG AC AG AAAT T G G C AAT AAT AAGG AT AT T T C T AGT G G AAC A 
AAAGT AT T AAT T T C AG AAG AT AGT AT T AAG AAT T T TAG T AAAG C AAG T AGT GAT C AAG AAG AAGT G GAT C G C GAT GAAT CAT CAT C 
T T C AAAAG C AAAT GAT G G G AAAAAAG G C C AC AGT AAG C C T AAAAAG GAA 

SEQ ID NO. 2707: SAG1473 FROM THE COH1 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATCAGATGAACTAGACCAGTCTAG 
TACTGGTT CT T CTT CTGAAAAT GAAT CAAGT T CAT CAAGT GAACCAGAAAC AAAT CC CTCAACT AAT C CAC CTACAACAGAAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGGAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGTATT AAT TT CAGAAGAT AGT ATT AAGAATT TT AGT AAAG CAAGT AGT GAT C AAG AAG AAGT GGAACGCGATG AAT CAT CAT C 
T T C AAAAG C AAAT GAT G AG AAAAAAGG C CAC AGT AAG C C T AAAAAGG AA 

SEQ ID NO. 2708: SAG1473 FROM THE H36b GBS TYPE lb STRAIN 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGAACTAGACCAGTCTAG 
T AC TGGTTCTTCTTCT G AAAAT GAAT CG AG T T CAT C AAGT G AAC C AG AAAC AAAT CC C T CAACT AAT C CAC CTACAACAGAAC CAT 
CG C AAC C C T CAC C T AGT GAAG AG AAC AAG C C T GAT G GT AG CAC GAAG AC AG AAAT TGG C AAT AAT AAG GAT AT T T CT AGT GG AAC A 
AAAGT AT T AAT T T CAGAAGAT AGT AT T AAG AAT T T T AGT AAAG CAAGT AGT GAT C AAG AA r AAG T G GAT C G C GAT GAAT CAT CAT C 
T T C AAAAG C AAAT GAT GAG AAAAAAG G C CAC AGT AAG C CT AAAAAG GAA 

SEQ ID NO. 2709: SAG1473 FROM THE JM910013 GBS TYPE VIII STRAIN 

GAT AC AAG T GAT AAG AAT AC T GAC AC G AGT GT CGT GAC TAG GAC C T TAT C T GAG GAG AAAAG AT TAG AT G AAC TAG AC C AG T CT AG 
TACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCAT 
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CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AG AAG AT AGT AT T AAG AAT T T T AG T AAAG C AAGT AG T GAT C AAGAAG AAG T GGAT C G C GAT G AAT CAT CAT C 
TTCAAAAGCAAATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2710: SAG1473 FROM THE M732 GBS TYPE III STRAIN 

GAT AC AAGT G AT AAGAAT AC T G AC AC G AGT GT CGT G ACT AC GAC CT T AT C T GAGG AGAAAAG AT C AG AT GAAC TAG AC C AGT C TAG 
TACTGGTTCTTCTTCTGAAAATGAATCAAGTTCATCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGGAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AG AAG AT AGT AT T AAG AAT T T T AG T AAAG C AAGT AGT GAT C AAG AAGAAGT G GAAC G C GAT GAAT CAT CAT C 
T T C AAAAG C AAAT GAT G AG AAAAAAGG C C AC AGT AAG C C T AAAAAG G AA 

SEQ ID NO. 2711: SAG1473 FROM THE M781 GBS TYPE III STRAIN 

GAT AC AAGT G AT AAG AAT ACT GAC ACG AGT GT CGT GACTACGACCTTAT CTGAGGAGAAAAGAT CAGAT GAACT AGACCAGT CTAG 
T AC TGGTTCTTCTTCT G AAAAT GAAT C AAGT T CAT C AAGT GAAC C AG AAAC AAAT C C CT C AAC T AAT C C AC C T AC AAC AGAAC C AT 
C G C AAC C C T C AC C TAG T GAAG AG AAC AAGC C T GAT GGG AGC AC GAAGAC AGAAAT T GG C AAT AAT AAGG AT AT T T CT AGT GG AAC A 
AAAG T AT T AAT T T C AG AAGAT AGT AT T AAG AAT T T T AGT AAAG C AAGT AG T GAT C AAG AAG AAG T GGAT CG CG AT GAAT CAT CAT C 
TTCAAAAGCAAATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

>SEQ ID NO 2750:4_1169NT frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKASD 
GKKGHSKPKKE 

>SEQ ID NO 2751:4_18RS21 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
GKKGHSKPKKE 

>SEQ ID NO 2752 :4_2 603 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
GKKGHSKPKKE 

>SEQ ID NO 2753:4_090 frame: 1 

DQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQPSPSEENKPDGRTKTEIGNNKDISSG 
TKVLISEDSIKNFSKASSDQEEVDRDESSSSKANDGKKGHSKPKKE 

>SEQ ID NO 2754:4_A909 frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2755:4_CJB110 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIPCNFSKASSDQEEVDRDESSSSBCAND 
GKKGHSKPKKE 

>SEQ ID NO 2756:4_COHl frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEXGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2757:4_H36B frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEXVDRDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2758:4_JM9130013 frame: 1 

DTSDKNTDTSVVTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
S PSEENKPDGSTKTE IGNNKDIS S GTKVLI SEDS IKNFSKAS S DQEEVDRDE S S S SKAND 
EKKGHSKPKKE 

>SEQ ID NO 2759:4_M732 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
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SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2760:4_M781 frame: 1 

dtsdkntdtswtttlseekrsdeldqsstgsssenesssssepetnpstnppttepsqp 
spseenkpdgstkteignnkdissgtkvlisedsiknfskassdqeevdrdesssskand 
ekkghskpkke 

seq id no. 2801: sag1552 from the 1169nt1 gbs type v strain 
(reverse complement) 

tttgttgttaaaggtgatactgtacttcacaagcccaccaataaaccttttgttgttaaaggagtagacgttgagtcttccttagc 
aggttatcatcacaacgattttcctattactcaaaaaacgtatcgtgagtggttccatttaatttccaacatgggggcaaatactg 
taagagtcaaagtaccgatgaatgttgcattttacgatgctttatatcaccacaacaaagcatcaaagaggccactgtatttgttg 
caaggaatacgtatagattcttatcgcaataatgcttctataacagcttttaatgataattatagggggtatttaaaacgagaagc 
aaaaggcgttgtggatattctccatgggcgtaagcaagtatggaatactgattttggtagccgtcattatcattatgatcttagtc 
cttgggtacttggttatgtcgtaggggatgattggaatagtggtactgtcgcttatactaatcatcaagagaaaaaaacgcaatat 
aaaggacgttattttaaaacttctgcggcagctaatccatttgaggtcatgctagctcaagttAtggatgaattgacacattatga 
g ac agc t aaat at ggt t gg caac at t t gat t agt t t t t c aaac t c ac c aac aac ag ac c c t t t t c gt tat c gaaaac cat t t g agg 
cacaggctcctaaatacgtacaactaaatgtagaaaatattcaagctaattcgaatgttaaagcaggtatttttgcagcatataaa 
gc t at t gat t t c cat c c t c gat ac aag gat tat c tat tat t t gat aaagag aat at c agt aaagaag at ag ac aaaagat t aaaga 
actttctttgtcacagggatacgttaaactgctaaatgcttatcacaaaatccctgttctagtcacgggttatggctattcgacag 
cgagaggtattgcccaaaaagaaattgataaacgtcctctgccgattaatgaaaaagaacaaggtcagcgtttactagaagattat 
gaatcttttatatcatccggtagttttggagcgactatcaatgcatggcaagacgattggaatgcaagggcgtggaatacatcctt 
cg c c ac aaat aaac at agt c aat t c ct at gg g g g gat gc ac aagt at t t aat c aaggt t at g gt t t at tag g c t t t aaaaac gc aa 
aac at cat tat c aagt t gat gg t aaaag agg c aaagg ag agt gg aaac at c ct ct g 

seq id no. 2802: sag1552 from the 

atgactagtgcaacaggagatgacttatatgctagcagtgatgaaagctatctctaccttgcgattaaaacaaaacctgaaaaact 

AAAAGAAAAACGAT TAT TACCAATAGAT AT TACAC CAAAAT CT GGTAGTAGAAAAATGAAT GGT AGT AAGGT CACAT T T T CT AAAT 
CTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTAT 
CTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATCAATATGGTATTGAG 
AAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAACAG 
G AAC AAT T GAT AG G C AC C AAAAAAC AT T T GAT T C AC AAAC AG AT AT TTCGTTTG GAAAGG AC T T TAT AG AG GT C AG AAT T C C GT GG 
CAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAATTGA 
GAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGAC 
CC G AT AC C AAAAC C T T T T T AAAAG AC T C C TAT TAT AG TAT T T AAG AAAG AA 

SEQ ID NO. 2803: SAG1552 FROM THE 18RS21 GBS TYPE II STRAIN 

AAGG G C T T AT T AAAAG AAAAT AC AAG AAC TAACT T T G T T G T T AAAG GT GAT AC T GT AC T T C AC AAG C C C AC C AAT AAAC C T T T T G T 
TGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGT 
TCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCAC 
AACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAA 
TGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATT 
TGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCT 
T AT ACT AAT CAT CAAGAG AAAAAAACGCAAT AT AAAGGACGTT ATTTT AAAACTT CTGT GGC AGCTAAT C CATT TGAGGT CATGCT 
AGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAA 
C AG AC C C T T T T CAT TAT C GAAAAC CAT T T G AGG C AC AGGC T C C T AAAT ACG T AC AAC T AAAT GT AGAAAAT AT T C AAG CT AAT T C A 
AATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAGAA 
T AT C AGT AAAG AAG AT AG AC AAAAG AT T AAAG AAC TTTCTTTGT C AC AG GG AT AC GT T AAAC T G C T AAAT G C T TAT C AC AAAAT C C 
CTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAA 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGA 
CGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATC 
AAGG T TAT G GT T T AT TAG G C T T T AAAAAC G C AAAAC AT CAT TAT C AAGT T GAT G GT AAAAG AGG C AAAG GAG AGT G G AAAC AT C C T 
CTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAA 
ACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTA 
AATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAAC 
TATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATT 
GAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAA 
C AGG AAC AAC T GAT AGG C AC C AAAAAAC AT T T GAT T C AC AAAC AG AT AT T T C GT T T GG AAAGG AC T T T AT AG AGGT C AG AAT T C C G 
TGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAAT 
T GAG AG CAT T G CT T TAG GAT TAG G T G C T AAT AG C AAAG AAAAC AC AC T GAT AAAG AT G G C AG AT TAT C GT T T G AAAAAT T G GG AG A 
GACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAAGAA 

SEQ ID NO. 2804: SAG1552 FROM THE 2603 V/R GBS TYPE V STRAIN 
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T ATTAAAAGAAAAT ACAAGAACT AACTT T GTT GT T AAAGGTGAT ACTGT ACTTC AC AAGC CC ACCAATAAACCT T TTGTTGTTAAA 
GGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTT 
AAT T T C C AACAT GGGG G C AAAT ACT GT AAGAGT C AAG GT AC CG AT G AAT GT T G CAT T T TAG GAT G C C T TAT AT C ACC AC AAC AAAG 
CAT C AAAG AG GC C ACT GT ATT T GT T G C AAG GAAT AC GT AT AG AT T CT T AT CG C AAT AAT G C T T C T AT AAC AG CT T T T AAT G AT AAT 
TATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAG 
CCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTA 
ATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAA 
GTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCC 
TTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTA 
AAG C AG G T AT GT T T G C AG CAT AT AAAG C TAT T GAT T T C CAT C C T C GAT AC AAG GAT TAT CT AT TAT T T GAT AAAG AG AAT AT CAGT 
AAAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCT 
AGT C ACGG GT TAT GG C TAT T CG AC AG C G AGAGG T AT T G C C C AAAAAG AAAT T G AT AAAC G T C CT C T G CC G AT T AAT G AAAAAG AAC 
AAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGG 
AATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
TGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCTCTGATGA 
CTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAA 
GAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAG 
TGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTC 
G AC AG CT T AACGG T AAAGAT T T T TAT GCT T T C C C AC C AAAGAAGAAC AGT AGT AAT T T T G AG C AG AT AAAT AT GGT AT T G AGAAAT 
AC AAAG AT T GT T GAAG AC AT G G AAAAAGT AAAAG C AAC AG AG AGG T T C T T ACC AAC T CAT C C T ACT GGTCTTCT C AAAAC AG GAAC 
AAC T GAT AG G C AC C AAAAAAC AT T T GAT T C AC AAAC AGAT AT T T C GT T T GG AAAG G AC T T T AT AG AGGT C AG AAT T C C GT GG CAGT 
T GT T GAAT T T T T C T GAT C CAT CAT C T C AAAAAAT T C AC GAT GAT T ACT T T AAAC AT TAT GGT GT GAAG GAG T TAG AAAT T G AGAG C 
ATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGA 
TACCAAAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAGAATGGTCTAAAGAAAGAGAGAGAACATATGGTCCA 

SEQ ID NO. 2805: SAG1552 FROM THE A909 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

AAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGT 

TGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGT 

TCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCAC 

AACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAA 

TGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATT 

TGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCT 

TATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCT 

AGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAA 

CAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCA 

AATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAGAA 

TATCAGTAAAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCC 

CTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAA 

AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGA 

CGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATC 

AAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCT 

CTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAA 

ACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTA 

AATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAAC 

TATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATT 

GAG AAAT AC AAAG AT T GT TG AAG AC AT GG AAAAAGT AAAAG C AAC AGAG AGGT T C T T AC C AACT CAT C CT AC TGGTCTTC T C AAAA 

C AG GAAC AAC T GAT AG G C ACC AAAAAAC AT T T GAT T C AC AAAC AG AT AT TTCGTTTG G AAAG G AC T T TAT AG AG G T C AGAAT T C C G 

TGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAGAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAAA 

T T GAG AG C CAT T G C T T T AG GAT TAG G T G C T AAT AG C AAAG AAAAC AC AC T GAT AAAG AT GG C AG AT T AT C GT T T G AAAAAT T GG G A 

GAGACCCGATACCAAAACCTTTTTAAAAGA 

SEQ ID NO. 2806: SAG1552 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

TATTACTTTGATGGTAGTTTGTATTTACCAAAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGT 
ACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTC 
CTATTACTC7^AAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAAT 
GT T G CAT T T T ACG AT G C CT TAT AT C AC C AC AAC AAAG CAT C AAAG AG G C C AC T GT AT T T G T T G C AAGG AAT AC G T AT AG AT T C T T A 
TCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCC 
ATGGGCGTAAGCAAGTATGGAATACAGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTA 
GGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTC 
T GT G GC AG C T AAT C CAT T T G AG GT CAT G C T AG CT C AAGT AAT GG AT GAAT T G AC AC AT TAT GAG AC AG C T AAAT AT G G T T GG C AAC 
ATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAA 
CTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATA 
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C AAGGATT AT CT AT T AT TT GAT AAAGAGAAT AT CAGTAAAGAAGATAGACAAAAGATT AAAGAACT TT CTTTGT CACAGGGATACG 
TTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAA 
ATTGAT AAACGTCCTCT GC CGATTAATGAAAAAGAACAAGGTCAGCGTT TACT AGAAGAT TAT GAAT CTTTTATATCAT CCGGTAG 
TTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAATCAAT 
TCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGT 
AAAAGAGG C AAAG GAGAGT GG AAAC AT C C T CT GAT GAC T AGT GC AAC AG GAG AT GAC T T AT AT GCT AGC AGT GAT G AAAG CT AT CT 
CTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAA 
AAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTC 
CAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAG 
TAGT AAT T TT GAGCAGAT AAAT ATGGT AT TGAGAAAT ACAAAGATTGTT GAAGACATGGAAAAAGT AAAAGCAACAGAGAGGTT CT 
TACCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTT 
G G AAAGGACT T T AT AG AGGT C AG AAT T C C GT G GC AGT T GT T GAAT T T T T C T GAT C CAT CAT CT C AAAAAAT T C AC GAT GAT T AC T T 
T AAAC AT T AT GGT GT G AAGGAGT T AGAAAT T GAGAGC AT T G C T T TAG GAT TAG GT G CT AAT AG C AAAG AAAAC AC AC T GAT AAAG A 
TGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAAGA 

SEQ ID NO. 2807: SAG1552 FROM THE COHl GBS TYPE III STRAIN 

TTTACCACAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAAC 
CTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGT 
GAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATA 
TCACCACAACAAAGAATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAG 
CTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAAT 
ACTGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTAC 
TGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGG 
TCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCA 
C C AAC AAC AG AC C CT T T T CAT TAT C G AAAAC CAT T T G AGG C AC AG G CT C CT AAAT AC GT AC AAC T AAAT GT AG AAAAT AT T C AAG C 
TAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
AAG AG AAT AT C AGT AAAG AAG AT AGAC AAAAG AT T AAAGAACT T T C T T T GT C AC AGGGAT AC GT T AAAC TG C T AAAT G CT T AT C AC 
AAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGAT 
TAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCAT 
GGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTA 
TTT AAT CAAGGTTATGGTTT ATT AGG CTTTAAAAACGC AAAAC ATC ATT AT CAAGTTG ATGGT AAAAGAGGCAAAGGAGAGTGGAA 
AC AT CC T CT GAT GACT AG T G C AAC AG GAG AT GAC T TAT AT GCT AGC AGT GAT G AAAG CT AT C T C t AC CT T G CG AT T AAAAC AAAAC 
CTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACA 
TTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAA 
AGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATA 
TGGTATTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTT 
CTCAAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAACCAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAG 
AATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGT 
TAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAAT 
T GGG AG AG AC C CG AT AC C AAAAC CT TTT T AAAAGACT 

SEQ ID NO. 2808: SAG1552 FROM THE H36b GBS TYPE lb STRAIN 

AAGGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTG 

TTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGG 

T T C CAT T T AAT T T C C AAC AT GGGGG C AAAT AC T GT AAG AGT C AAG GT AC C GAT GAAT GT T G CAT T T T AC G AT GC CT T AT AT C AC C A 

CAACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTA 

ATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGAT 

TTTGGTAGCAGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATGGACATAGTGGTACTGTCGC 

TTTATACTAATCATCT^AGAGGAGAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCAT 

G CT AG CT C AAGT AAT GG AT GAAT T GAC AC AT TAT GAG AC AG C T AAAT AT G GT T G G C AAC AT T T GAT TAGT T T T T C AAACT C AC C AA 

C AAC AG AC C C T T T T CAT TAT CG AAAAC CAT T T G AGG C AC AGG C T C CT AAAT AC GT AC AAC T AAAT G TAG AAAAT AT T C AAGCT AAT 

TCGAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGA 

GAAT AT CAGTAAAGAAGAT AG AC AAAAG ATT AAAGT^ACTTTCTTTGTC AC AGGG AT ACGTTAAACTGCTAAATGCTT ATC AC AAAA 

TCCCTGTTCTAGTCACGGGTTATGGCTACTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAAT 

GAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCA 

AGACGATTGGAATGCAAGGGTGTGGAATACATCCTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTA 

ATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAGGTTGATGGTAAAAGAGGCAAAGAAGAGTGGAAACAT 

CCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGA 

AAAACT AAAAG AAAAAC GAT TAT T AC C AAT AG AT AT T AC AC C AAAAT C T G GT AGT AG AAAAAT G AAT GG TAG T AAGGT C AC AT TTT 

CTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAACGCCTTAAAAGCG * 

AACTATCTTCGACAGCTTAATGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 

AT T GAG AAAT AC AAAG AT T G T T G AAG AC AT GG AAAAAGT AAAAG C AAC AG AG AG GT T CT T AC C AAC T CAT C C T AC TGGTCTTCTCA 

AAAC AG GAAC AAC T G AT AGG C AC C AAAAAAC AT T T GAT T C AC AAAC AG AT AT TTCGTTTG G AAAG GACT T T AT AG AG GT C AG AAT T 

CCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGA 
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AATT GAGAGCAT TGCTTT AGGATT AGGTGCT AAT AGCAAAGAAAACACACTGAT AAAGATGGC AGATTAT CGT TT GAAAAATT GGG 
AGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATAGT 

SEQ ID NO. 2809: SAG1552 FROM THE JM9130013 GBS TYPE VIII STRAIN 

ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTA 
GCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATAC 
T G T AAGAGT C AAGGT AC C GAT G AAT GT T G CAT T T T AC G AT GCC T TAT AT C AC C AC AAC AAAGC AT C AAAG AGGC C ACT GT AT T T GT 
TGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAA 
GCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCAGTCATTATCATTATGATCTTAG 
TCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAAT 
ATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATTAT 
GAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGA 
GGC AC AG G CT C CT AAAT AC GT AC AACT AAAT GT AG AAAAT AT T C AAG C T AAT T C G AAT G T T AAAG C AGGT AT GT T T G C AG CAT AT A 
AAG C TAT T GAT T T C CAT C CT C G AT AC AAG GAT TAT C TAT TAT T T G AT AAAG AGAAT AT C AGT AAAGAAG AT AGAC AAAAG AT T AAA 
GAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTACTCGAC 
AGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATT 
ATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGTGTGGAATACATCC 
TTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGC 
AAAACAT CAT TAT C AGGT T GATGGT AAAAGAGGCAAAGAAGAGT GGAAACAT C CT CTGATGACT AGTGCAACAGGAG ATGACT TAT 
AT GCT AG C AG T GAT G AAAG CT AT CT C T AC CT T GC G AT T AAAAC AAAACCT GAAAAACT AAAAGAAAAAC GAT T AT T AC C AAT AG AT 
ATT ACACCAAAATCTGGT AGT AGAAAAATGAATGGTAGT AAGGT CACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCC 
AAAT G G C AAGT C T G AAT TAT T T GT C C AAG AG CG C TAT AAC G C C T T AAAAGCG AACT AT C T T C GAC AG C T T AAT GGT AAAG AT T T T T 
ATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTGAGAAATACAAAGATTGTTGAAGACATGGAA 
AAAGT AAAAGC AAC AG AG AG GT T CT T AC C AACT CAT C C T AC TGGTCTTCT C AAAAC AGGAAC AACT GAT AGGC AC C AAAAAAC AT T 
T GAT T C AC AAAC AG AT AT T T C GT T T GG AAAG G ACT T TAT AG AGGT C AG AAT T C C GT G G C AG TT GT T G AAT TT T T C T GAT C CAT CAT 
C T C AAAAAAT T C ACG AT GAT TACT T T AAAC AT T AT GGT GT G AAGG AGT TAG AAAT T GAGAGCAT T G CT T TAG GAT T AGGT G C T AAT 
AG C AAAGAAAAC AC AC T GAT AAAG AT G G C AG AT TAT C GT T T G AAAAAT T G GG AG AGAC C CG AT AC C AAAAC CT T T T T AAAAG ACT C 
CTATTATAGTATTAAGAAAG 

SEQ ID NO. 2810: SAG1552 FROM THE M732 GBS TYPE III STRAIN 

TACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTG 
AGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATG 
GGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCAAAGAGGCC 
ACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATT 
TAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCAT 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGCAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAA 
AAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAAT 
TGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGA 
AAACC AT TTG AGGC AC AGGCTCCT AAAT ACGT AC AACT AAAT GT AG AAAAT ATT CAAGCT AATT CAAATGTT AAAGC AGGT AT GTT 
T G C AG CAT AT AAAG C TAT T GAT T T C CAT C CT C G AT AC AAGGAT T AT CT AT TAT T T GAT AAAG AGAAT AT C AGT AAAG AAG AT AG AC 
AAAAG ATTAAAG AACTTTCTT TGT CACAGGG AT ACGTT AAAC TGCTAAATGCTTATC AC AAAATCCCT GTT CTAGTC ACG GGT TAT 
GGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTT 
ACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGT 
GGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGC 
T T T AAAAAC G C AAAAC AT CAT TAT C AAGT T GAT G G T AAAAG AG G CAAAGG AG AGT GGAAACAT C CT C T GAT GAC T AGT G C AAC AGG 
AGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTAT 
TACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTG 
TCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGG 
T AAAGAT T T T TAT G C T T T C C C AC C AAAG AAG AAC AGT AGT AAT T T T GAG C AG AT AAAT AT GGT AT T GAGAAAT AC AAAG AT T GT T G 
AAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCAC 
CAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTC 
TGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAATTGAGAGCATTGCTTTAGGAT 
T AG GT G C T AAT AG C AAAG AAAAC AC AC T GAT AAAG AT GG C AG AT TAT C GT T T G AAAAAT T GGG AG AG AC C C GAT AC C AAAAC CT TT 
TTAAAAGACTCCTATTATAGTATTAAG 

SEQ ID NO. 2811: SAG1552 FROM THE M781 GBS TYPE III STRAIN 

TTTGATGGTAGTTTGTATTTACCACAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCA 
C AAG C C C AC C AAT AAAC CT T TT GT T GT T AAAG G AGT AG AC GT T G AGT C T T C CT TAG C GGGT TAT CAT C AC AAC GAT T T T C CT AT T A 
CTCAAAAAACGT AT CGTGAATGGTTCC ATT T AATT TCCAACATGGGGGC AAAT ACT GT AAGAGT C AAGGT ACCGATGAATGTTGC A 
TTTTACGATGCCTTATATCACCACAACAAAGAATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAA 
TAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGC 
GT AAGC AAGT AT G G AAT AC T GAT T T T G G T AG C C GT CAT TAT CAT TAT GAT C T T AGT CCTTGGGTACTT G GT T AT GT C G TAG GGG AT 
GATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAA7\ACTTCTGTGGC 
AGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGA 
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T T AGT T T T T C AAAC T C AC C AAC AAC AG AC C CT T T T CAT TAT C G AAAACC AT T T GAGG C AC AG G CT C CT AAAT AC GT AC AAC T AAAT 
GTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGA 

T TAT CT ATT ATTTGAT AAAGAGAATAT CAGTAAAGAAGAT AG ACAAAAGAT TAAAGAACTTT CTT TGT CACAGGGATACGTT AAAC 
TGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGAT 
AAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGG 
AGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTAT 
GGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGA 
GGCAAAGGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCT 
TGCGATTAAAACAAAACCT GAAAAACT AAAAGAAAAACGAT TAT T ACCAAT AGATATTAC ACCAAAATCTGGTAGT AGAAAAAT GA 
ATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAG 
CGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAA 
T T T T GAG C AGAT AAAT AT G GT AT T G AGAAAT AC AAAGAT T GT T G AAG AC AT G G AAAAAGT AAAAG C AAC AGAG AGGT T C T T ACC AA 
CT C AT C CT ACT G GT CT T CT C AAAAC AG G AAC AACT G AT AGG C AC C AAAAAAC AT T T GAT T C AC AAAC AG AT AT T T C GT T T GG AAAG 
GACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACA 

T TAT G GT GT G AAGG AGT TAG AAAT T G AGAG C ATT G CTT TAG GAT TAG GT G CT AAT AG C AAAG AAAAC AC AC T GAT AAAG AT G G C AG 
AT TAT C GT T T GAAAAAT T G GGAG AG AC C C GAT AC C AAAAC CT T T T T AAAAG ACT C CT AT T AT AGT AT T AAG AAAG AAT G G 

>SEQ ID NO 2850: 62_1169NT frame: 1 i 

fwkgdtvlhkptnkpfwkgvdvesslagyhhndfpitqktyrewfhlisnmgantvrv 
kvpmnvafydalyhhnkaskrplyllqgiridsyrnnasitafndnyrgylkreakgvvd 
i lhgrkqvwnt d fg s rh yh y dl s pw vlgywg d dwn s gt vaytnhqekkt qykgry fkt s 
aaanpfevmlaqvmdelthyetakygwqhlisfsnspttdpfryrkpfeaqapkyvqlnv 
eniqansnvkagifaaykaidfhprykdyllfdkeniskedrqkikelslsqgyvkllna 
yhkipvlvtgygystargiaqkeidkrplpinekeqgqrlledyesfissgsfgatinaw 
qddwnarawntsfatnkhsqflwgdaqvfnqgygllgfknakhhyqvdgkrgkgewkh.pl 
mtsatgddlyassdesylylaiktkpeklkekrllpiditpksgsrkmngsecvtfskssd 
fvlsidpngkselfvqerynalkanylrqlngkdfyafppkknssnfeqinmvlrntkiv 
edmekvkaterflpthptgllktgtidrhqktfdsqtdisfgkdfievripwqllnfsdp 
ssqki hdd yfkhygvke lei esialglgans kent likmadyrlknwerpdtktflkdsy 

YSI.ER 

>SEQ ID NO 2851 : 62_JL8RS21 frame: 1 

KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHL 
ISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKRE AKGWD I LHGRKQVWNT DLG S RH YH Y DLS PWVLG YVVG D DWN S GT VAYTNHQEKK 
TQYKGRYFKT S VAAN PFEVMLAQVMDELTHYET AKYGWQHL I S FSN S PTT D P FHYRK P FE 
AQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LSQGYyKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFIS 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDTTPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWER 
PDTKTFLKDSYYVLRK 



>SEQ ID NO 2852:62_2603 frame: 3 

LKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISN 
MGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRGYLK 
RE AKGWD I LHGRKQVWNT DLG SRHYHYDLS PWVLG YVVG D DWN SGT VAYTNHQEKKT QY 
KGRY FKT S VAAN PFEVMLAQVMDELTHYETAKYGWQHLIS FSN S PTT DPFHYRKPFEAQA 
PKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELSLSQ 
GYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGS 
FGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRG 
KGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSK 
VTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINM 
VLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPW 
QLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLECNWERPDT 
KTFLKDSYYSIKKEWSKERERTYGP 

>SEQ ID NO 2853:62_A909 frame: 1 

KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHL 
ISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKREAKGVVDILHGRKQVWNTDLGSRHYHYDLSPWVLGYWGDDWNSGTVAYTNHQEKK 
TQYKGRYFKT S VAAN PFEVMLAQVMDELTHYE T AKYGWQHL I S FSN S PTT D P FHYRKP FE 
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AQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFIS 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQRIHDDYFKHYGVKELEN . EPLL . D . VLIAKKTH . . RWQIIV . KIGR 
DPIPKPF.K 

>SEQ ID NO 2854:62_A909 frame: 1 

KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHL 
ISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLGYVVGDDWNSGTVAYTNHQEKK 
TQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFE 
AQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LS QG Y VKLLNAYHKI P VLVTG YGYS T ARG I AQKE I DKRP L P INEKE QGQRLLE DYE S FI S 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
G S KVT F S K S S D FVL SIDPNGKSEL FVQER YN ALKAN YLRQ LN GK D F YA F P PKKN S S N FE Q 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKT FDSQTDISFGKDFIEVR 
I PWQ LLN FSDPSSQRIHD D Y FKH YG VKE LEN . EPLL . D . VLIAKKTH . . RWQIIV . KIGR 
DPIPKPF.K 

>SEQ ID NO 2855:62_CJB110 frame: 1 

YYFDGSLYLPKGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 
QKTYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNAS 
ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYWGDDWNSGT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 
DPFHYRKPFEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISK 
EDRQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQR 
LLEDYES FI S SGS FGATINAWQDDWNARAWNTS FATNKHNQFLWGDAQVFNQGYGLLGFK 
NAKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
TPKSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
SFGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
ADYRLKNWERPDTKTFLKDSYYVLRK 

>SEQ ID NO 2856:62_COHl frame: 2 

LPQGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWF 
HLISNMGANTVRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRIDSYRNNASITAFNDNY 
RGYLKRE AKGVVD I LHGRKQVWNT D FGSRH YHYDLS PWVLGYVVGDDWN SGTVAYTNHQE 
KKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKP 
FEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKE 
LSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESF 
IS SGS FGATINAWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQV 
DGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRK 
MNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNF 
EQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQPDISFGKDFIE 
VRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNW 
ERPDTKTFLKD 

>SEQ ID NO 2857:62_H36B frame: 2 

RGLLKENTRTNFV\/KGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPITQKTYREWFHL 
ISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKREAKGWDI LHGRKQVWNT DFGSSHYHYDLSPWVLGYVVGDDGHSGTVALY 

>SEQ ID NO 2858 : 62_JM9130013 frame: 3 

FVVKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISNMGANTVRV 
KVPMNVAFYDALYHHNKASKRPLYLLQGIR.IDSYRNNASITAFNDNYRGYLKREAKGWD 
ILHGRKQVWNTDFGSSHYHYDLSPWVLGYWGDDWNSGTVAYTNHQEKKTQYKGRYFKTS 
VAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFEAQAPKYVQLNV 
ENIQANSNVBCAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELSLSQGYVKLLNA 
YHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGSFGATINAW 
QDDWNARVWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRGKEEWKHPL 
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MTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSKVTFSKSSD 
FVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINMVLRNTKIV 
EDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPWQLLNFSDP 
SSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDTKTFLKDSY 
YSIKK 

>SEQ ID NO 2859:62_M732 frame: 2 

TRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISNMGAN 
TVRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRIDSYRNNASITAFNDNYRGYLKREAK 
GWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYWGDDCNSGTVAYTNHQEKKTQYKGRY 
FKT S VAAN P FE VML AQ VM DE L T H YE T AK Y GW QHL ISFSNSPTTDP FH YRKP FE AQ APK YV 
QLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELSLSQGYVK 
LLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGSFGAT 
INAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRGKGEW 
KHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSKVTFS 
KSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINMVLRN 
TKIVEDMEBCVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPWQLLN 
FSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDTKTFL 
KDSYYSIK 

>SEQ ID NO 2860:62_M781 frame: 1 

FDG S L YL PQGLLKENTRTN FWKGDT VLHKPTNKP FWKGV DVE S S LAG YHHN D FP I T QK 
T YREWFHLI SNMGANT VRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRI DS YRNNAS IT 
AFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYVVGDDWNSGTVA 
YTNHQEKKT QYKGRYFKT S VAAN P FE VMLAQVMDE LTH YET AKYGWQHLI S FSNS PTT D P 
FHYRKPFEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKED 
RQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLL 
EDYESFISSGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNA 
KHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITP 
KSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPK 
KNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISF 
GKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMAD 
YRLKNWERPDTKTFLKDSYYSIKKEW 

SEQ ID NO. 2901: SAG1641 FROM THE 090 GBS TYPE la STRAIN 

AAT C AAG AAGT T T C AG C AAG C T C AACT T C AAGT AAAG T T GT T AAAGT T GGT G T TAT GAG C T T T T C T G AC AC T GAAAAAGC AC GT T G 
GGATAAAATTGAAAAGCTAGTAGGCGATAAAGCTAAAATCAAATTCACAGAATTTACAGATTATACACAACCAAATCAAGCGACAG 
C C AAT AAG GAT G T GG AT AT T AAT G C CT T T C AAC AT T AC AAT T T C T T AG AAAAC T GG AAT AAGG AAAAT AAG AAAAACT T AAT T C C A 
CT T GAAAAGACTTACTT AGCCC CAATT CGTAT CT ATT CT GAGAAGGT AAAAT CT CTT AAAAAATTGAAAAAAGGAGCCACT ATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAAGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTC 
AAAGATGTAGATGC AGCTATT AT T AAT AAT AC AT ACATT GAGC AAGCT AATT T AAAACCT T C AGATGCT AT CT TT GTTGAGAAATC 
AGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTA7\AGCTATCCAAGCTA 
TCTTGGATGCTTATCACACAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGGAACCCAGCTTTCTTG 
TACAA 

SEQ ID NO. 2902: SAG1641 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT ) 

AT C AAG AAGT T T C AG C AAG CT C AAC T T C AAGT AAAGT T G T T AAAG T T GGT GT TAT G AC C T T T T CT G AC AC T G AAAAAG C AC GT T G G 
GAT AAAAT TGAAAAGCTAGTAGGTGATAAAGCT AAAAT CAAAT TTACAGAATT TACAGATT AT AC ACAACCAAAT CAAGCGACAGC 
C AAT AAG GAT G T GG AT AT T AAT G C C T T T C AAC AT T AC AAT T T C TT AG AAAACT GG AAT AAG G AAAAT AAG AAAAACT T AAT T C C AC 
TTGAAAAGACT TACT TAGCT CC AAT T CGTATCT ATT CT GAGAAGGT AAAAT CT CTT AAAAAATT GAAAAAAGGAGCC ACT ATT GCA 
ATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAA 
GGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCA 
AAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCA 
GATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTAT 
CT TGGAT GCTT AT C AC AC AGATGAAGTGAAAAAAGT T AT CAAAG AT ACTT CAGCTGAT ATT CCACAATGG 

SEQ ID NO. 2903: SAG1641 FROM THE 18RS21 GBS TYPE II STRAIN 

AAT C AAG AAGT T T C AG C AAG C T C AAC T T C AAGT AAAGT T GT T AAAGT T G GT GT T AT G AC CT T T T C T G AC AC T G AAAAAG C AC G T T G 
G GAT AAAAT T GAAAAG CT AGT AG G T GAT AAAG CT AAAAT CAAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC CAAAT C AAG C G AC AG 
C C AAT AAGGAT GTGGAT AT T AAT GC CTT T CAACATT ACAATT T CTT AGAAAACTGGAAT AAGG AAAAT AAG AAAAACTT AAT TCC A 
CTT GAAAAG AC T T AC T T AG CT C C AAT T C GT AT CT AT T CT GAGAAG GT AAAAT CT CT T AAAAAAT T G AAAAAAG G AGC C AC TAT T G C 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
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AGGT T G C AAC AGT T G CT AAT AT C AC AT C T AAT AAAAAG GAT AT T AAT AT T C AG G AGT T AG AT G C G AGT C AAAC AC C ACGT G C ACT C 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AG AT AAAAAT T C AAAAC AAT GGAT T AAT AT CAT T GCG G G AC GT AAAAAT T GG AAAAAG C AAAAG AAC G C T AAAG C T AT C C AAG CT A 
T CT T G GAT G CT T AT C AC AC AG AT G AAGT G AAAAAAGT T AT C AAAG AT ACT T C AG C T GAT AT T C C AC 

SEQ ID NO. 2904: SAG1641 FROM THE 2603 V/R GBS TYPE V STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAG 
C C AAT AAGGAT G T GGAT AT T AAT G C C T T T C AAC AT T AC AAT T T C T TAG AAAAC T G G AAT AAG GAAAAT AAG AAAAACT T AAT T C C A 
CTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCCAGTCAAACACCACGTGCACTC 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTA 
T CT T GG AT G C T T AT C AC AC AG AT G AAGT G AAAAAAGT TAT C AAAGAT AC T T C AG C T GAT AT T C C AC AAT G G 

SEQ ID NO. 2905: SAG1641 FROM THE A909 GBS TYPE la STRAIN 

AAT C AAGAAGT T T C AGCAAGCT CAACTT CAAGT AAAGT T GTT AAAGTT GGT GTTAT GACCT TT T CT GACACT GAAAAAGCACGTT G 
G GAT AAAAT T GAAAAG C T AGT AGGT GAT AAAG CTAAAAT C AAAT T T AC AGAATT T AC AG AT TAT AC AC AAC C AAAT C AAG C G AC AG 
C C AAT AAGGAT GT G GAT AT T AAT G C C T T T C AAC AT T AC AAT T T C T TAG AAAAC T GG AAT AAGG AAAAT AAG AAAAAC T T AAT T C C A 
CTTGAAAAGACTT ACTT AG CT CCAATT CGT AT CTATTCTGAGAAGGT AAAAT CT CTT AAAAAAT TGAAAAAAGGAGC CACT ATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGT TGCAACAGTTGCTAAT AT CACATCTAATAAAAAGGAT ATT AAT ATT CAGGAGTTAGATGCG AGT CAAACACC ACGT GCACTC 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AG AT AAAAAT T C AAAAC AAT GG ATT AAT AT C AT TGCGGG ACGT AAAAAT TGGAAAAAGC AAAAG AACGCTAAAGCT AT CC AAG CT A 
T C T T G GAT G C T T AT C AC AC AGAT GAAGT G AAAAAAGT TAT C AAAG AT AC T T C AG C T GAT AT T C C AC AAT GG 

SEQ ID NO. 2906: SAG1641 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGCGATA 
AAGCT AAAAT CAAATTCACAGAATTTACAGATT AT ACAC AAC CAAATCAAGCGACAGCCAAT AAGGAT GT GGAT ATT AATGCCTTT 
CAACAT TACAATTTCT TAG AAAACT GG AAT AAG GAAAAT AAGAAAAACT TAATT CCACT T GAAAAGACT TACT TAGCCCC AAT T CG 
TATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCC 
GTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCT 
AAT AAAAAAG AT AT T AAT AT T C AG G AGT T AGAT G C GAG T C AAAC AC C AC GT G C AC T C AAAG AT G TAG AT G C AG CT AT TAT T AAT AA 
T AC AT AC AT T GAG C AAG C T AAT T T AAAAC CTT C AG AT G C T AT C T T T GT T GAG AAAT C AG AT AAAAAT T C AAAAC AAT G GAT T AAT A 
TCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTG 
AAAAAAGT TAT C AAAG AT AC T T C AG C T GAT AT T C C AC AAT GGAA 

SEQ ID NO. 2907: SAG1641 FROM THE COHl GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AGT T T C AG C AAG C T C AACT T C AAG T AAAGT T GT T AAAGT T G GT GT T AT G AC C T T T T CT G AC AC T G AAAAAGC AC GT T GG G AT AAAA 
T T G AAAAGC T AGT AG GT GAT AAAG CTAAAAT C AAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC C AAAT C AAG CG AC AG C C AAT AAG 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAAGAAAAACTTAATTCCACTTGAAAA 
GACTT ACT TAGCT CCAATT CGT AT CTATTCTGAGAAGGT AAAAT CTCTTAAAAAATTGAAAAAAGGAGCCACT ATT GCAATTCCAA 
ATGATGCAACAAATGGTAGCCGTGCATTGTATGTACTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCA 
AC AGT T G C T AAT AT C AC AT C T AAT AAAAAGG AT AT T AAT AT T C AG G AGT TAG AT G C GAGT C AAAC AC C AC GT G C AC T C AAAG AT GT 
AGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAA 
ATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGAT 
G C T TAT C AC AC AG AT GAAGT G AAAAAAGT TAT C AAAG AT AC T T C AG CT GAT AT T C C AC AAT GG 

SEQ ID NO. 2908: SAG1641 FROM THE H36b GBS TYPE lb STRAIN 

AAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGAT 
AAAAT T GAAAAG C TAG TAG G T GAT AAAGC T AAAAT C AAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC C AAAT C AAG CG AC AGC C AA 
T AAG GAT G T G GAT AT T AAT G C C TT T CAACAT T AC AAT T T C T TAG AAAAC T G G AAT AAG GAAAAT AAG AAAAAC T T AAT T C CACT T G 
AAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATT 
CCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGT 
TGCAACAGTTGCT AAT AT CACATCTAATAAAAAGGAT ATT AAT ATT CAGGAGTT AGAT GCGAGT CAAACACC ACGT GC ACT CAAAG 
ATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGAT 
AAAAAT T C AAAAC AAT G GAT T AAT AT CAT T G C G G G AC GT AAAAAT T GG AAAAAG C AAAAG AAC GC T AAAG C T AT C C AAG C T AT CTT 
GGAT G C T TAT C AC AC AG AT G AAG T G AAAAAAGT T AT CAAAG AT ACT T C AG CT GAT AT T C C AC AAT GG 

SEQ ID NO. 2909: SAG1641 FROM THE JM3190013 GBS TYPE VIII STRAIN 

TTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTG 
AAAAG C T AG T AG GT GAT AAAG CTAAAAT C AAAT T T AC AG AAT T T AC AG AT T AT AC AC AAC C AAAT C AAG C G AC AG C C AAT AAG GAT 



130 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



GTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAAGAAAAACTTAATTCCACTTGAAAAGAC 
TTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATG 
ATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACA 
GTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGA 
T GC AG C TAT TAT T AAT AAT AC AT AC AT T GAG CAAG C T AAT T T AAAAC C T T C AGAT G C TAT CT T T GT T GAGAAAT C AGAT AAAAAT T 
C AAAAC AAT G GAT T AAT AT CAT T G C GGGAC G T AAAAAT T GG AAAAAG C AAAAGAACGC T AAAG C T AT C C AAG CT AT C T T GGAT GCT 
TAT C AC AC AG AT G AAG T GAAAAAAGT TAT C AAAG AT AC T T C AG C T GAT AT T C C AC AAT GG 

SEQ ID NO. 2910: SAG1641 FROM THE M732 GBS TYPE III STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAG 
C C AAT AAGGAT GT GGAT AT T AAT G C C T T T C AAC AT T AC AAT T T CT T AGAAAAC T G GAAT AAG G AAAAT AAG AAAAACT T AAT T CCA 
CTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGT^AGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAAT7\AAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTC 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AG AT AAAAAT T C AAAAC AAT GGAT T AAT AT CAT T G C GG GACGT AAAAAT T GG AAAAAG C AAAAGAACG C T AAAG CT AT C C AAG CT A 
T CTT GGAT GCTT AT CACACAGATGAAGT GAAAAAAGTTAT CAAAGAT AC 

SEQ ID NO. 29X1: SAG1641 FROM THE M781 GBS TYPE III STRAIN 

AGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAA 
TTGAAAAGCTAGTAGGT GAT AAAGCT AAAAT CAAATTTACAGAATTTACAGATT AT ACAC AAC CAAATCAAGCGACAGCC AAT AAG 
G AT GT GG AT AT T AAT G C C T T T C AAC AT T AC AAT T T C T TAG AAAAC T G GAAT AAGG AAAAT AAG AAAAAC T T AAT T C C ACT T GAAAA 
GACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAA 
ATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCA 
ACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGT 
AGAT G C AG C TAT TAT T AAT AAT AC AT AC AT T GAG CAAG CT AAT T T AAAAC C T T C AGAT GCT AT C T T T G T T GAGAAAT C AG AT AAAA 
ATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTGGGAT 
G CT T AT C AC AC AG AT GAAG T GAAAAAAGT TAT CAAAGAT AC T T C AG C T GAT AT T C C AC AAT G G 

>SEQ ID NO 2950: 35_090 frame: 1 

NQEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATAN^ 
DVD INAFQH YN FLENWNKENKKNL I PLEKT YLAPIRI Y SEKVKS LKKLKKGAT I AI PN DA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQT PRALKDVDAAII 
NNTYIEQANLKPSDAIFVEKSDBCNSKQWINIIAGRKNWKKQPCNAKAIQAILDAYHTDEVK 
KVIKDT SAD I PQWNPAFLY 

>SEQ ID NO 2951: 35_1169NT frame: 3 

QEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKD 
VDINAFQHYNFLENWNKENKKNLI PLEKT YLAPIRI YSEKVKSLKKLKKGATIAIPNDAT 
NGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIIN 
NTYIEQANLKPSDAI FVEKS DKNSKQW IN I I AGRKNWKKQKN AKAI QAILDAYHTDEVKK 
VIKDTSADIPQW 

>SEQ ID NO 2952: 35_18RS21 frame: 1 

NQEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 
DVD INAFQHYNFLENWNKENKKNLI PLEKT YLAPIRIYSEKVKS LKKLKKGAT I AIPN DA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNT Y I E QANLKP S DAI FVEKS DKN SKQW IN 1 1 AGRKNWKKQKN AKAI Q AI L DAYHT DE VK 
KVIKDTSADIP 

>SEQ ID NO 2953:35_2603 frame: 1 

NQEVS AS STS SKVVKVGVMT FS DTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 
DVD INAFQH YN FLENWNKENKKNL I PLEKT YLAP IRIY SEKVKS LKKLKKGAT I AI PN DA 
TNGSRALYVLQSAGLIKLNVSGKBCVATVANITSNKKDINIQELDASQTPRALKDVDAA.il 
NNT YIEQANLKPS DAI FVEKSDKNSKQWINII AGRKNWKKQKN AKAI QAILDAYHTDEVK 
KVIKDT SAD I PQW 

>SEQ ID NO 2954:35_A909 frame: 1 

NQEVSAS STS SKVVKVGVMT FSDTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 
DVD INAFQHYNFLENWNKENKKNLI PLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 
TNGSRALYVLQSAGLIKLNVSGKKVAT VAN IT SNKKD IN I QELDASQT PRALKDVDAAII 
NNTYIEQANLKPS DAI FVEKS DKNSKQWINI I AGRKNWKKQKN AKAIQAILDAYHTDEVK 
KVIKDT SADI PQW 
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>SEQ ID NO 2955:35_CJB110 frame: 2 

SKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVDINAFQHY 
NFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDATNGSRALYVL 
QSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTYIEQANL 
KPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVKKVIKDTSADI 
PQW 

>SEQ ID NO 2956:35_COHX frame: 2 

VSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVD 
INAFQHY7SIFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAI PNDATNG 
SRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNT 
YIEQANLKP S DAI FVEKS DKN SKQWIN 1 I AGRKNWKKQKNAKAIQAI LDAYHT DE VKKVI 
KDTSADIPQW 

>SEQ ID NO 2957:35__H36B frame: 3 

EVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDV 
DINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDATN 
GSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINN 
TYIEQANLKPSDAI FVEKS DKN SKQWIN 1 1 AGRKNWKKQKNAKAIQAILDAYHTDEVKKV 
IKDTSADIPQW 

>SEQ ID NO 2958 :35_JM9130013 frame: 2 

SASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVDI 
NAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAI PNDATNGS 
RALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTY 
IEQANLKPSDAI FVEKS DKNSKQW INI IAGRKNWKKQKNAKAIQAILDAYHTDEVKKVIK 
DTSADIPQW 

>SEQ ID NO 2959:35_M732 frame: 1 

N QE VSASSTSS K V VK VG VMT F S DT E KARW DK I E K L VG D KAK IKFTEFTDYTQ PN Q AT ANK 
DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGAT IAI PNDA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNTY IEQANLKPSDAI FVEKS DKN SKQWIN I IAGRKNWKKQKNAKAIQAI LDAYHT DEVK 
KVIKD 

>SEQ ID NO 2960:35_M781 frame: 2 

VSAS STSSKWKVGVMT FS DTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANKDVD 
INAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGAT IAI PNDATNG 
SRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNT 
Y I EQAN LKP S DAI FVEKS DKN SKQWIN 1 1 AGRKNWKKQKNAKAIQAIWDAYHTDE VKKVI 
KDTSADIPQW 

SEQ ID NO. 3001: SA62147 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAG T TACT AC T G AAT C T T T GT C AAAAG C AG AT AAAGT T C G C G T AG C C 
AAAAAAT C AAAAAT G AC T AAG G C G AC AT C T AAAT C AAAAG T AG AAG AT GT AAAAC AGG C T 
CCAAAACCTTCTCAGGCATCTAATGAAGTCCCAAAATCAAGTTCTCAATCTACAGAAGCT 
AAT T CT C AG C AAC AAGT TACT G C GAG T G AAG AG GC G G C T GT AG AAC AAG C AGT T G T AAC A 
G AAAAT AC CCCTGCTAC C AGT C AG G C AC AAC AAAC T TAT G CT G T T ACT G AG AC AAC T T AC 
AAAC C T GCT C AAC AC C AG AC AAG T GG C C AAGT AT T G AG C AAT G G AAAT AC T G C AG G G G C G 
GTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGG 
GAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCT 
TCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAGTT 
AATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 3002: SAG2147 FROM THE 18RS21 GBS TYPE II STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTT CACAAGT TACT ACT GAAT CTTT GT CAAAAGCAGAT AAAGTT C 

GCGT AGCCAAAAAAT C AAAAATGACT AAGGCG AC AT CT AAAT CAAAAGT AGAAGAT GT AA 

AACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTA 

C AG AAG CT AAT T CT C AG C AAC AAGT T ACT G C GAG T G AAG AG G C AG CT G T AG AAC AAG C AG 

T T GT AAC AG AAAAC AC CC C T G CT AC C AGT C AG G C AC AAC AAG CT TAT G C T GT TACT GAGA 

C AACT T AT AG AC C T G C T C AAC AC C AG AC GAG T GG C C AAGT AT T GAG T AAT GG AAAT AC T G 
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CAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGT 
CTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCT 
CAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGG 
ATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 3003: SAG2147 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT T AC T AC T G AAT C T T T GT C AAAAG C AGAT AAAGT 

T C G C GT AGC C AAAAAAT C AAAAAT GAC T AAGG C GAC AT CT AAAT C AAAAGT AG AAGAT G T 

AAAACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATC 

TACAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGC 

AGTTGTAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGA 

GACAACTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATAC 

TGCAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCA 

GTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGC 

CT C AG GAG C T T C AG GACT T T T C C AAAC GAT G C C AGGT T GGGGT T C AAC AG C T AC AGT T C A 

GGAT C AAGT T AAT T C AG C T AT T AAAG CT TAT CGT G CT C AAG GT T T AT C AG CT T G G GGT T A 

C 

SEQ ID NO. 3004: SAG2147 FROM THE 090 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

TAG C C AAAAAAT C AAAAAT GAT T AAG G CG AC AT C T AAAT C AAAAG TAG AAG AT G T AAAAC 
AGGC T C C AAAACC T T C T C AG G CAT CT AAT G AAGC C C C AAAAT C AAGT T C T C AAT C TAG AG 
AAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTG 
TAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAA 
CTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTGCAG 
GGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTA 
CTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAG 
G AGCT T C AG GAC T T T T C C AAACG AT G C C AG GT T G G GG T T C AAC AG CT AC AGT T C AGG A 

SEQ ID NO. 3005: SAG2147 FROM THE A909 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

AAG G CG AC AT C T AAAT C AAAAGT AG AAGAT G T AAAAC AGGCT CC AAAAC CT T CT C AGGC A 
T C T AAT G AAG C CC C AAAAT C AAGT T CT C AAT C T AC AG AAGC T AAT T CT C AG C AAC AAGT T 
AC T GC G AGT GAAGAG G C AG CT GT AG AAC AAG C AGT T GT AAC AG AAAAC ACC C CT G CT ACC 
AGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAG 
AC AAGT GG C C AAGT AT T GAGT AAT G GAAAT AC T G C AGG GG C TAT T GG CT C AG C AG CT G C A 
GCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGT 
G AAT C AAAT GGT AAT C CT AAT GT T G C T AAT G C CT C AGG AG CT T C AGG AC T T T T C C AAACG 
AT G C C AG GTTGGGGTT C AAC AG CT AC AGT T C AG AAT C AAGT T AAT T C AG CT AT T AAAG CT 
TATCGTGCTCAAGGTTTATCA 

SEQ ID NO. 3006: SAG2147 FROM THE CJB110 GBS NONTYPEABLE STRAIN 
(REVERSE COMPLEMENT) 

AAT CTT TGT C AAAAG C AG AT AAAGT TCG CGT AGCC AAAAAAT CAAAAATG ACT AAGGCG A 
CAT C T AAAT C AAAAGT AG AAG AT G T AAAAC AGGC T C C AAAAC CTT C T C AGG CAT C T AAT G 
AAG C C C C AAAAT C AAGT T C T C AAT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T AC T G C G A 
GT GAAGAG G C AG CT GT AG AAC AAGC AG T T GT AAC AG AAAAC AC C C C T G CT AC C AGT C AG G 
C AC AAC AAG C T T AT G CT GT T AC T GAG AC AAC T T AT AGAC C T G C T C AAC AC C AG ACG AGT G 
G C C AAGT AT T GAGT AAT GG AAAT AC T G C AGG G G C T AT T GG C T C AG C AG C T G C AG C AC AAA 
TGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAA 
ATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAG 
GTTGGGGTT C AAC AG C T AC AGT T C AG GAT C AAG T T AATT C AG CT AT T AAAG C T T AT CGT G 
CT C AAGGT T TAT CAGCTT GGGGTT AC 

SEQ ID NO. 3007: SAG2147 FROM THE COHl GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT T AC T AC T G AAT C T T T GT C AAAAG C AG AT AA 

AGT T C G C GT AG C C AAAAAAT C AAAAAT GACT AAG G C GAC AT C T AAAT C AAAAG TAG AAG A 
T GT AAAAC AGGCT CC AAAAC CTT CT C AGGC AT CT AAT G AAGC CCC AAAAT CAAGTTCTC A 
AT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T AC T G C GAGT GAAGAG GC GG C T GT AG AAC A 
AG C AGT T G T AAC AG AAAAT AC CCCTGCTAC C AGT C AGG C AC AAC AAAC T TAT G C T GT T AC 
T G AG AC AACT T AC AAAC C T G CT C AAC AC C AG AC AAGT GG C C AAGT AT T G AG C AAT G G AAA 
TACTGCAGGGGCGGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCC 
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TCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAA 
TGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGT 
TCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGG 
TTAC 

SEQ ID NO. 3008: SAG2X47 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT T AC T ACT G AAT CT T T GT C AAAAG C 

AG AT AAAGT T C G C GT AG C C AAAAAAT C AAAAAT G AC T AAG G C G AC AT CT AAAT C AAAAGT 
AG AAG AT GT AAAAC AGGCT C C AAAAC CT T CTC AGG C AT CT AAT G AAG C C C C AAAAT C AAG 
TTCTCAATCTACAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGT 
AG AAC AAG C AGT T GT AAC AGAAAAC AC CC C T G C T AC C AGT C AGG C AC AAC AAG C T TAT G C 
T GT TAG T G AG AC AAC T TAT AG AC C T G CT C AAC AC C AG AC AAGT G G C C AAGT AT T GAG T AA 
T G G AAAT ACT G C AG GGG C T AT T G G CT C AG C AGC T GC AG C AC AAAT GGC T G CT GC AAC AGG 
AGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGT 
TGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGC 
TACAGTTCAGGATCAAGTTAATTCAGCTATTAAAGCTT 

SEQ ID NO. 3009: SAG2147 FROM THE M732 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT T AC T AC T G AAT C T T T GT C AAAAGC AG AT AAAG T T CG CGT AGC 

C AAAAAAT C AAAAAT G AC T AAGG C GAC AT C T AAAT C AAAAGT AG AAG AT GT AAAAC AG G C 

T C CAAAACCTT CT CAGGC AT CT AATG AAGC CCC AAAAT CAAGTT CT CAAT CT AC AGAAGC 

T AAT T CT C AG C AAC AAG T T ACT G C GAG T G AAG AG GC GG C T GT AG AAC AAG C AGT T GT AAC 

AGAAAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTA 

C AAAC CT G C T C AAC AC C AGAC AAGTG G CC AAGT AT T G AGC AAT GG AAAT AC T G C AGGGG C 

G GT CGG AT CT G CT G CT G C AG C AC AAAT GGCTGCTG C AAC AG G AGT C C C T C AGT CT AC T T G 

GGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGC 

TTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAGT 

TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

SEQ ID NO. 3010: SAG2147 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GTAACCCCAAGCTGATAAACCTTGAGCACGATAAGCTTTAATAGCTGAATTAACTTGATC 
CTGAACTGTAGCTGTTGAACCCCAACCTGGCATCGTTTGGAAAAGTCCTGAAGCTCCTGA 
GGC AT TAG C AAC AT TAG GAT T AC CAT T T GAT T C AC GGG CAAT AAT AT GT T C C C AAGT AG A 
CTGAGGGACTCCTGTTGCAGCAGCCATTTGTGCTGCAGCAGCAGATCCGACCGCCCCTGC 
AGTATTTCCATTGCTCAATACTTGGCCACTTGTCTGGTGTTGAGCAGGTTTGTAAGTTGT 
C T C AGT AAC AG CAT AAG TTTGTTGTGCCT G ACT G GT AG C AG G GGT AT T T T C T GT T AC AAC 
TGCTTGTTCTACAGCCGCCTCTTCACTCGCAGTAACTTGTTGCTGAGAATTAGCTTCTGT 
AGATTGAGAACTTGATTTTGGGGCTTCATTAGATGCCTGAGAAGGTTTTGGAGCCTGTTT 
TACATCTTCTACTTTTGATTTAGATGTCGCCTTAGTCATTTTTGATTTTTTGGCTACGCG 
AACTTTATCTGCTTTTGACAAAGA 

>SEQ ID NO 3050: 25_1169NT frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEVPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3051:25_18RS21 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVE DVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3052:25_2603 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
G S AAAAQMAAATG V PQ S T WEH 1 1 ARE SNGN PN V ANAS GAS GL FQTMPG WG S T AT VQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3053:25_090 frame: 3 

AKKSKMIKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVV 
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TENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQST 
WEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQ 

>SEQ ID NO 3054:25_A909 frame: 1 

KATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAWTENTPAT 
SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHIIAR 
ESNGNPNVANASGASGLFQTMPGWGSTATVQNQVNSAIKAYRAQGLS 

>SEQ ID NO 3055:25_CJB110 frame: 3 

S L S KADKVRVAKKS KMT KAT S KS KVE DVKQAPKP S QASNE APK S S S Q S TE AN S QQQVT AS 
EEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

>SEQ ID NO 3056:25_COH1 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3057:25_H36B frame: 1 

KS SQVTTESLSKADKVRVAKKSKMTKATSKSKVE DVKQAPKP SQASNEAPKSSSQSTEAN 
S QQQVT ASEEAAVEQAWTENTP AT SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKA 

>SEQ ID NO 3058:25_M732 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPS QASNE APKSSSQSTE AN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWG 

>SEQ ID NO 3059:25_M781 frame: 4 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAVGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

SEQ ID NO. 3101: SAG2148 FROM THE 1169NT1 6BS TYPE V STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
T AGT AT C AGT AACG C T GAT GT CAT C AGT AT AG GT G AT GT T T T AAAAT T G G AT AAT T C TAG AG C T AGT C AAG C AGAAG C AAAAT C T C 
AACCAACAATT GAAAATTCAAT GAATT CTT CATCAAATT T GAGT T C AAGTGATT C AGCTGCAAAAGAAGAAAT AGCT CGTCGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3102: SAG2148 FROM THE 18RS21 GBS TYPE II STRAIN 

G CAT C T TAT AC C GT GAAAT C AG GT GAT AC CT TAT C AG C T AT T G CT AAAAAT C AT AAAACT AC G GT AC AAG AGT T AGT GT CT CT C AA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AACCAACAAT TGAAAATT C AAT GAATT CTT CAT CAAATTTGAGT T CAAGTGAT T CAGCCGCAAAAGAAGAAAT AGCT CGT CGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAAT AGT AACG 
GCTGGTAT 

SEQ ID NO. 3103: SAG2148 FROM THE 2603 V/R GBS TYPE V STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTZ^ACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC C AAC AAT TGAAAAT T C AAT GAATT CT TC AT CAAATTTGAGTTCAAGTG AT TCAGCCGCAA7VAG7VAGAAAT AGCT CGT CGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3104: SAG2148 FROM THE 090 GBS TYPE la STRAIN 
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GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAAGCAGAAGCAAAATCTC 
AACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTC7VAGTGATTCAGCCGCAAAAGAAGAAATAGCTCGTCGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3105: SAG2148 FROM THE A909 GBS TYPE la STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC C AAC AAT T GAAAAT T C AAT G AAT T C T T CAT C AAAT T T GAG T T C AAG T GAT T C AG C CGC AAAAG AAG AAAT AGC T C GT CGT G AA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3106: SAG2148 FROM THE CJB110 GBS NONT Y PE ABLE STRAIN 

G CAT CT TAT AC CGT G AAAT C AG GT GAT AC CT TAT C AG CT AT T GC T AAAAAT C AT AAAACT ACGGT AC AAG AGT T AGT GT CT C T C AA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T C T T CAT C AAAT T T G AGT T C AAGT GAT T C AGC C G C AAAAG AAG AAAT AG CT CGT C GT GAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATAT CAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3107: SAG2148 FROM THE COHl GBS TYPE III STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAATAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCAAAAGAAGAAATAGCTCGTCGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3108: SAG2148 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T CT T CAT C AAAT T T G AGT T C AAGT GAT T C AG C C G C AAAAG AAGAAAT AG C T CG T C GT G AA 
T C AAAT G GT AGT T AT AC T G C AC AG AAT GG AC AAT AT TAT G G AAG AT AT C AACT GT C T C AAT C T T AC CT AAAT GG C G ACT TAT C T C C 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3109: SAG2148 FROM THE JM9130013 GBS TYPE VIII STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGACGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAACTAGTCAAGCAGAAGCAAAATCTC 
AAC C AAC AATTGAAAAT T C AAT GAATTCTTC AT CAAATTTG AGT TC AAGT GAT TC AGC CGC AAAAG AAGAAAT AGCTCGTCGT GAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3110: SAG2148 FROM THE M732 GBS TYPE III STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAATAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T CT T CAT C AAAT T T GAG T T C AAG T GAT T C AG CT G C AAAAG AAGAAAT AG CT C GT C GT GAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3111: SAG2148 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GC AT CT T AT ACCGTG AAAT CAGGT GAT AC CTT AT CAGCT AT TGCT AAAAAT CAT AAAACT ACGGT AC AAT AGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T CT T CAT C AAAT T T GAG T T C AAG T GAT T C AG CT G C AAAAG AAG AAAT AG CTCGTCGT GAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
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T G AAAAT C AAG AAAAAGT AG CGGAC AAT T AT GT GG CT T CT C GT T AC GG AT CT T G GT C GG C AGC G C TAT CAT T T T GG AAT AGT AAC G 
GCTGGTAT 

>SEQ ID NO 3150:15_JL169NT frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3151:15_18RS21 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVV S RYGSWSAAL S FWNSNGWY 

>SEQ ID NO 3152:15_2603 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYWSRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3153:15_090 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSKASQAEAKSQPT 
IENSMNS S SNLS S SDSAAKEE I ARRE SNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYW S R YG S W S AAL S FWNSNGWY 

>SEQ ID NO 3154:15_A909 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNS S SNLS S SDSAAKEE I ARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADN Y VAS R YG S W S AAL S FWN S N GWY 

>SEQ ID NO 3155:15_CJB110 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSKASQAEAKSQPT 
IENSMNSSSNLS S SDSAAKEE I ARRE SNGSYTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYWSRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3156:15_COHl frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LV S LN S I SN AD V I S I G D VLKL DN S T AS QAE AKS Q PT 
IENSMNS S SNLS S S DSAAKEE IARRESNGSYT AQNGQYYGRYQLS QS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3157:15_H36B frame: 1 

AS Y'T VKS G DT L S AI AKNHKTT VQE LVS LN SIS N AD V I S I G D VLK L DN S T AS QAE AKS QPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3158:15_JM9130013 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTTSQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3159 : 15_M732 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LVSLNS I SNADVI S IGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3160 : 15_M781 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LVSLNS I SNADVIS IGDVLKLDNSTASQAEAKSQPT 
IENSMNS SSNLSS SDSAAKEE I ARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

SEQ ID NO 4001 : SAG0653 FROM THE 2603 V/R GBS TYPE V STRAIN 

AT G AAG AAAG T GT TAG T G AG T AGT CTTTTGGTTT T AGG G AT T AC GAT A 
ACGTTACAAACAGTAGTTGAGGCTAAGGGGCCAAAAGTAGCTTATACACAAGAGGGAATG 
ACTGCTCTTT CG G AC AC AAAT AAAG AT AAAGT C AC TAG TAT T T C T AT T G AC GAG AT T C AA 
AAAAGCT T AGAAGGT AAG AAGC CG AT T ACT G T T AGT T T T GAT AT T GAT GAT AC ACT G C T T 
TTCAGTAGTCAATATTTTCAATATGGTAAAGAATATGTAACTCCTGGATCGTTTGATTTT 
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CT T CAT AAAC AAAAAT T C T G G GAT CT T GT T G C AAAAC GAG GAG AT C AAGAT T C CAT T C C C 
AAAGAATATGCTAAAAAATTAATTGCTATGCATCAAAAACGAGGAGATAAAATTGTTTTT 
ATAACAGGTAGGACAAGAGGGTCAATGTATAAGGAGGGCGAGGTTGATAAAACAGCTAAA 
G C C T T AG C T AAAG AT T T T AAAT T AG AC AAAC C AAT T G CT GT AAATT AT AC AG GC GAT AAA 
CCT AAAAAGCC AT ACAAAT AT GAT AAAT CAT AT TATATT AAGAAATAT GGTT CAGAC ATT 
CAT TAT GGAGAT AGT GAT GAC GAT AT T CAT G C AG C T AGG GAG G C C G GT GCT AG AC C AAT T 
AGAATTTTAAGAGCACCTAATTCTACAAATCTACCTTTACCAGAAGCTGGAGGCTACGGT 
G AAG AGGT T CT CG AAAAT T C AG C T T AC 

SEQ ID NO 4002 : SA60653 FROM THE 090 GBS TYPE III STRAIN 

AAGG GG C C AAAAGT AG C TT AT AC AC AAG AG GGAAT GAC 
TGCTCTTTCG GAC AC AAAT AAAG AT AAAGT C AC T AC TAT T T CT AT T GAC G 
AGAT T C AAAAAAG C T T AGAAGGT AAG AAG C CG AT T AC T GT T AG T T T T GAT 
ATT GAT GAT ACACTACTTTTC AGT AGT CAAT AT TTT CAAT ATGGTAAAGA 
ATATGTAACTCCTGGATCGTTTGATTTTCTTCATAAACAAAAATTCTGGG 
AT CT T GT T G C AAAAC G AGG AG AT C AAG AT T C CAT T C C C AAAG AAT AT GCT 
AAAAAATTAATTGCTATGCATCAAAAACGAGGAGATAAAATTGTTTTTAT 
AAC AGGT AGGAC AAG AGG GT CAAT G TAT AAG G AGGG C G AG GT T G AT AAAA 
C AG C T AAAG CCT TAG CT AAAGAT T T T AAAT TAG AC AAAC CAAT T G CT GT A 
AAT T AT AC AGG CG AT AAAC CT AAAAAG C CAT AC AAAT AT GAT AAAT CAT A 
T TAT AT T AAG AAAT AT GGTT CAGAC AT T CAT TAT G GAG AT AGT GAT G AC G 
AT AT T CAT G C AG CT AG G G AGG C C GG T G C T AGAC C AAT T AGAAT T T T AAGA 
GCACCTAATTCTACAAATCTACCTTTACCAGAAGCTGGAGGCTACGGTGA 
AGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4003 : SAG0653 FROM THE A909 GBS TYPE la STRAIN 

AAG G GG C C AAAAG T AG CT TAT AC AC A 

AG AG G G AAT G ACT G C T C T T T CG GAC AC AAAT AAAG AT AAAGT C ACT AC T A 
T T T C TAT T GAC GAG AT T C AAAAAAGC T T AG AAGGT AAG AAG C C GAT T AC T 
GTTAGTTTTGATATTGATGATACACTGCTTTTCAGTAGTCAATATTTTCA 
ATATGGTAAAGAATATGTAACTCCTGGATCGTTTGATTTTCTTCATAAAC 
AAAAATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCC 
AAAGAAT AT G C T AAAAAAT T AAT T G CT AT G CAT C AAAAACG AGGAG AT AA 
AAT T GT T T T TAT AAC AGGT AGG AC AAG AGG GT C AAT GT AT AAGG AG GG CG 
AGGTTGATAAAACAGCTAAAGCCTTAGCTAAAGATTTTAAATTAGACAAA 
C C AAT T G C T GT AAAT TAT AC AG G C GAT AAAC C T AAAAAG C CAT AC AAAT A 
T GAT AAAT CAT AT TAT AT T AAGAAATAT GGT T CAGAC AT T CAT TAT GG AG 
ATAGTGATGACGATATTCATGCAGCTAGGGAGGCCGGTGCTAGACCAATT 
AGAAT T T T AAG AG C AC CT AAT T CT AC AAAT CT AC CT T T AC C AG AAG C T GG 
AGGCTACGGTGAAGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4004 : SAG0653 FROM THE 18RS21 GBS TYPE II STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGA 

G GGAAT G ACT G CT CT T T C G GAC AC AAAT AAAG AT AAAGT C AC TACT AT T T 
CT AT T GAC GAG AT T C AAAAAAG CT TAG AAG GT AAG AAG C CG AT T ACT GT T 
AGTT T TG AT AT T GAT GAT AC ACT GCTTT T C AGT AGT CAAT AT TTT CAAT A 
T GG T AAAG AAT AT GT AACT CCT GG AT C GT T T GAT T TT C T T CAT AAAC AAA 
AATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCCAAA 
G AAT AT GC T AAAAAAT T AAT T G C T AT G CAT C AAAAACG AGGAGAT AAAAT 
TGTTTTTATAACAGGTAGGACAAGAGGGTCAATGTATAAGGAGGGCGAGG 
T T GAT AAAAC AGC T AAAG C C T T AGC T AAAG AT T T T AAAT T AGAC AAAC C A 
AT T G CT GT AAAT T AT AC AGG C GAT AAAC CT AAAAAG C CAT AC AAAT AT G A 
T AAAT CAT AT TAT ATT AAGAAATAT GGTT CAGAC ATT CAT TAT GGAGAT A 
G T GAT GAC GAT AT T CAT G C AG C T AGGGAGG C CG G T G CT AG AC CAAT TAG A 
AT T T T AAG AG C AC CT AAT T CT AC AAAT C T AC CT T T AC C AG AAG CT GG AG G 
C T ACG GT G AAG AG GT T C T C G AAAAT T C AG CT T AC 

SEQ ID NO 4005 : SAG0653 FROM THE M732 GBS TYPE III STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGA 

GGG AAT G AC T GCTCTTTCG GAC AC AAAT AAAG AT AAAGT C AC T AC TAT T T 
CT AT T GAC GAG AT T C AAAAAAG C T TAG AAG G T AAG AAG C C G AT T AC T GT T 
AGT TTT GAT AT T GAT GAT AC ACT G C T T T T C AGT AG T CAAT AT TTT CAAT A 
T G GT AAAG AAT AT GT AAC T C C T GG AT C GTT T GAT T T T CT T CAT AAAC AAA 
AAT T C T G GG AT CT T GT T G C AAAAC GAG GAG AT C AAG AT T C C AT T C C C AAA 
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G AAT AT GCT AAAAAATT AATTGCT AT GCAT CAAAAACGAGGAGAT AAAAT 
T GT T T T T AT AAC AG GT AG G AC AAG AGGGT C AAT GT AT AAGG AGGGCG AGG 
TTGATAAAACAGCTAAAGCCTTAGCTAAAGATTTTAAATTAGACAAACCA 
ATTGCTGTAAATTATACAGGCGATAAACCTAAAAAGCCATACAAATATGA 
TAAATCATATTATATTAAGAAATATGGTTCAGACATTCATTATGGAGATA 
GTGATGACGATATTCATGCAGCTAGGGAGGCCGGTGCTAGACCAATTAGA 
AT T T T AAGAG C ACC T AAT T C T AC AAAT C T AC C T T TAG CAGAAG C T GGAGG 
C T AC G GT G AAG AG GT T C T C GAAAAT T C AGC T T AC 

SEQ ID NO 4006 : SAG0653 FROM THE COH1 GBS TYPE III STRAIN 

AAG GGG C C AAAAGT AG CT T AT AC AC AAG AG GGAAT GAC T 
G CT C T T T C GG AC AC AAAT AAAG AT AAAGT C ACT ACT AT T T C TAT T G ACG A 
GAT T C AAAAAAG C T T AG AAGGT AAG AAG C C GAT T ACT GT T AGT T T T GAT A 
T T GAT GAT AC AC T G CT T T T C AGT AGT C AAT AT T T T C AAT AT G GT AAAG AA 
TATGTAACTCCTGGATCGTTTGATTTTCTTCATAAACAAAAATTCTGGGA 
T CT T GT T GC AAAACG AGGAG AT C AAG AT T CCAT T C CC AAAG AAT AT GCT A 
AAAAAT T AAT T G C TAT G CAT C AAAAAC G AGG AG AT AAAAT T GT T T T TATA 
AC AGGT AGG AC AAG AGGGT C AAT GT AT AAGGAGGGC GAGGT T GAT AAAAC 
AG C T AAAG C C T T AGC T AAAGAT T T T AAAT T AG AC AAAC C AAT T G C T GT AA 
AT TAT AC AGG C G AT AAAC CT AAAAAG C CAT AC AAAT AT GAT AAAT CAT AT 
T ATATT AAGAAATATGGTT CAGACATT CATTATGGAGAT AGTGAT GACGA 
TAT T C AT GC AG CT AGG G AGG C CGG T G CT AG AC C AAT T AGAAT T T T AAG AG 
C AC C T AAT T C T AC AAAT C T AC CT T T AC CAGAAG C T G G AGGC T AC G GT GAA 
GAGGT T CT C GAAAAT T C AG C T T AC 

SEQ ID NO 4007 : SAG0653 FROM THE M781 GBS TYPE III STRAIN 

AAGGGGCCAAAAGTAGCTTATACACA 

AG AG G G AAT G ACT G CT C T T T C G GAC AC AAAT AAAGAT AAAGT C AC T AC T A 
T T T C T AT T G ACGAG AT T C AAAAAAG C T T AGAAGGT AAG AAG C CG AT TACT 
GT T AGT T T T GAT AT T GAT GAT AC AC T GC T T T T C AGT AG T C AAT AT T T T C A 
ATATGGTAAAGAATATGTAACTCCTGGATCGTTTGATTTTCTTCATAAAC 
AAAAATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCC 
AAAG AAT AT GCT AAAAAAT T AAT T GCT AT GCAT CAAAAACG AGGAG AT AA 
AAT T GT T T T TAT AAC AG G TAG GAC AAG AG G GT C AAT GT AT AAGG AG G G C G 
AGGT T GAT AAAACAGCT AAAGC CTT AGCT AAAGATTTT AAATT AGACAAA 
C C AAT T G CT GT AAAT TAT AC AG G C G AT AAAC CT AAAAAGC C AT AC AAAT A 
TGAT AAAT CAT AT TAT ATT AAG AAAT AT GGTT CAGACATTCATTATGGAG 
AT AG T GAT GAC GAT AT T CAT G C AG C T AGGG AGG C CGGT GCT AG AC C AAT T 
AGAAT T T T AAG AG C AC C T AAT T C T AC AAAT C T AC CTT T AC CAGAAG C T GG 
AG G C T AC G GT G AAGAGGT T C T C GAAAAT T C AG CT TAG 

SEQ ID NO 4008 : SAGO 653 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGA 

G G G AAT GAC TGCT C T T T C GG AC AC AAAT AAAG AT AAAGT C AC TACT AT T T 
C T AT T G AC G AGAT T C AAAAAAG C T TAG AAG G T AAG AAGC C GAT T AC T G T T 
AGT T T T GAT AT T GAT GAT AC AC T G C T T T T C AGT AGT C AAT AT T T T C AAT A 
T GGT AAAGAAT ATGT AACT C CT GGATCGTTT GAT TTT CTT CAT AAAC AAA 
AAT T C T G G G AT CT T G T T G C AAAAC G AGG AG AT C AAG AT T C C AT T C CC AAA 
G AAT AT G C T AAAAAAT T AAT T G C TAT G CAT C AAAAAC GAG GAG AT AAAAT 
T GT T T T T AT AAC AGGT AGG AC AAG AGGGT C AAT GT AT AAGG AGGGCG AGG 
T T GAT AAAAC AG CT AAAG C CT T AG C T AAAG AT T T T AAAT TAG AC AAAC C A 
ATT GCT GT AAAT T AT AC AGG CG AT AAAC CT AAAAAGC CAT AC AAAT AT G A 
T AAAT CAT AT TAT AT T AAG AAAT AT GGT T C AG AC AT T CAT TAT GGAG AT A 
GTGATGACGATATTCATGCAGCTAGGGAGGCCGGTGCTAGACCAATTAGA 
AT T T T AAG AGC AC C T AAT T C TAG AAAT CT AC C T TT AC CAGAAG C T GGAGG 
CTACGGTGAAGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4009 : SAG0653 FROM THE JM9130013 GBS TYPE VIII STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGAGGGAAT 

GACTGCTCTTTCGGACACAAATAAAGATAAAGTCACTACTATTTCTATTG 
AC GAG AT T C AAAAAAG CT TAG AAGG T AAG AAG C C GAT T AC T G T T AGT T T T 
GAT AT T GAT GAT AC AC T G C T T T T C AGT AGT C AAT AT TTT C AAT AT G G T AA 
AG AAT ATGT AACT CCTGGATCGTTTGATTTT CTT CAT AAAC AAAAATTCT 
GGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCC AAAGAAT AT 
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GCT AAAAAAT T AAT T G C T AT G CAT C AAAAAC GAG G AGAT AAAAT T GT T T T 
T AT AACAGGT AG G AC AAGAGG GT C AAT GT AT AAGG AG G G C G AG GT T GAT A 
AAACAGCTAAAGCCTT AGCTAAAGATT T T AAATTAGACAAACCAATT GCT 
GTAAATTATACAGGCGATAAACCT AAAAAGC CATACAAAT AT GAT AAAT C 
AT AT T AT AT T AAGAAAT AT GGT T C AG AC AT T CAT T AT GGAG AT AGT GAT G 
ACG AT AT T CAT GC AGCT AGGGAGGCCGGT GCT AG ACC AAT T AG AAT T T T A 
AGAGCACCTAATTCTACAAATCTACCTTTACCAGAAGCTGGAGGCTACGG 
TGAAGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4010 : SAG0653 FROM THE 2 603 V/R GBS TYPE V STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYAKKLIAMHQKRGDKIVFITGRTRGS 
M YKE GE V DKT AKAL AKD FKL DK P I AVN Y T GDKPKK P YK Y DK S Y Y I KK Y G S D I HY G D S DD D 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4011 : SAG0653 FROM THE 090 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4012 : SAG0653 FROM THE A909 GBS TYPE la STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTTIS IDE I QKS LEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4013 : SAG0653 FROM THE 18RS21 GBS TYPE II STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKE YVT PGS FDFLHKQKFWDLVAKRGDQDS I PKE YAKKLI AMHQKRG DKI VFI TGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4014 : SAG0653 FROM THE COH1 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTT I S I DEIQKSLEGKKPI TVS FDI DDTLLFS S QYFQY 
GKEYVT PG S FDFLHKQKFWDLVAKRGDQDS I PKEYAKKL I AMHQKRG DKI VFI TGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4015 : SAG0653 FROM THE M781 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTTIS I DEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKE YVT PGS FDFLHKQKFWDLVAKRGDQDS I PKE YAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4016 : SAG0653 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTTIS I DEI QKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVT PGS FDFLHKQKFWDLVAKRGDQDS I PKE YAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4017 : SAGO 653 FROM THE JM9130013 GBS TYPE VIII STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTIS IDE IQKSLEGKKPITVS FDI DDTLLFS SQYFQY 
GKEYVTPGS FDFLHKQKFWDLVAKRGDQDS I PKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4018 : SAG0653 FROM THE M732 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTIS IDE IQKSLEGKKPITVS FDI DDTLLFS SQYFQY 
GKEYVT PGS FDFLHKQKFWDLVAKRGDQDS I PKEYAKKLIAMHQKRGDKIVFITGRTRGS 
M YKEGE V DKT AKAL AKD FKL DK P I AVN Y T G DKP KK P Y K Y DK S Y Y I KK YG S D I H YG D S D D D 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO. 4101: SAG0649 FROM 2603 V/R GBS TYPE V STRAIN 
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AT GAAAAAG AG AC AAAAAAT A 

TGGAGAGGGTTATCAGTTACTTTACTAATCCTGTCCCAAATTCCATTTGGTATATTGGTA 
C AAG G T G AAAC C C AAG AT AC C AAT C AAG C ACT T GGAAAAGT AAT T GT T AAAAAAACGG G A 
GACAATGCTACACCATTAGGCAAAGCGACTTTTGTGTTAAAAAATGACAATGATAAGTCA 
GAAACAAGTCACGAAACGGTAGAGGGTTCTGGAGAAGCAACCTTTGAAAACATAAAACCT 
G GAG AC TAG AC AT T AAG AG AAG AAAC AG C AC C AAT T G GTT AT AAAAAAACT G AT AAAAC C 
TGGAAAGTT AAAGTT GCAGAT AACGGAGCAACAATAAT CGAGGGT AT GGAT GCAGAT AAA 
G C AG AG AAAC G AAAAG AAGT T T T GAAT G C C C AAT AT C C AAAAT C AG CT AT T T AT GAG GAT 
ACAAAAGAAAATTACCCATTAGTTAATGTAGAGGGTTCCAAAGTTGGTGAACAATACAAA 
G CAT T GAAT C C AAT AAAT GG AAAAG AT GGT C G AAGAG AG AT T G C T GAAGGT T G GT TAT C A 
AAAAAAAT T AC AG G G GT C AAT GAT CT C GAT AAG AAT AAAT AT AAAAT T GAAT T AAC T GT T 
G AGGGT AAAACC ACT GT T G AAACG AAAGAACT T AAT C AACC ACT AG AT GTCGTTGTGCTA 
T TAG AT AAT T CAAATAGT AT GAAT AAT G AAAG AG C C AAT AAT T C T C AAAG AG CAT TAAAA 
G C T G GG G AAG C AG T T G AAAAG C T GAT T GAT AAAAT T AC AT C AAAT AAAG AC AAT AG AGT A 
GCTCTTGTGACATATGCCTCAACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGA 
GT T G C C GAT C AAAAT G GT AAAG CG C T GAAT GAT AGT GT AT CAT GG G AT TAT CAT AAAAC T 
ACTTT TACAGC AACTACAC AT AATT AC AGTT ATTT AAATTTAACAAAT GAT GCT AACGAA 
GTTAATATTCTAAAGTCAAGAATTCCAAAGGAAGCGGAGCATATAAATGGGGATCGCACG 
CTCTATCAATTTGGTGCGACATTTACTCAAAAAGCTCTAATGAAAGCAAATGAAATTTTA 
GAGAC AC AAAGT T CT AAT G CT AG AAAAAAACT TAT T T T T C AC GT AACT GAT GGT GT CCCT 
ACG AT GT C T T AT G C CAT AAAT T T T AAT C C T TAT AT AT C AAC AT C T T AC C AAAAC C AG T T T 
AAT T CT T T T T T AAAT AAAAT ACC AG AT AG AAGT G GT AT T C T C C AAGAG GAT T T TAT AAT C 
AAT GGT GAT GAT T AT C AAAT AGT AAAAG GAG AT GG AG AG AGT T T T AAACT GT T T T C GG AT 
AGAAAAGT T C C T GT TAG T G G AGG AAC G AC AC AAG C AG C T TAT C GAG TAG C G C AAAAT C AA 
CTCTCTGTAATGAGTAATGAGGGATATGCAATTAATAGTGGATATATTTATCTCTATTGG 
AG AG AT T AC AAC T GG G T C TAT C CAT T T GAT C C T AAG AC AAAG AAAGT T T C T GC AAC G AAA 
C AAAT C AAAAC T CAT GGT GAGC C AAC AAC AT TAT AC T T T AAT GG AAAT AT AAG AC CT AAA 
GGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAGATCCTGGTGCAACTCCTCTT 
GAAG C T G AGAAAT T TAT G C AAT C AAT AT C AAG T AAAAC AG AAAAT TAT ACT AAT G T T G AT 
GAT AC AAAT AAAAT T TAT GAT G AGC T AAAT AAAT ACT T T AAAAC AAT T GT T G AGG AAAAA 
CAT T CT AT T GT T GAT GGAAAT GT GACT GAT CCT AT GGG AG AG AT GAT T GAAT T C C AAT T A 
AAAAATGGTC AAAGT TTTACACATGATGATTACGTTTTGGTTGGAAATGATGGCAGTCAA 
TTAAAAAATGGTGTGGCTCTTGGTGGACCAAACAGTGATGGGGGAATTTTAAAAGATGTT 

AC AGTG ACT TAT GAT AAG AC AT CT C AAAC CAT C AAAAT C AAT CAT T T GAACT TAG GAAGT 
GGACAAAAAGTAGTTCTTACCTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAA 
TTTTACAATACAAATAATCGTACAACGCTAAGTCCGAAGAGTGAAAAAGAACCAAATACT 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGTACTAACCATC 
AG T AAT CAGAAG AAAAT G GGT G AGGT T GAAT T TAT T AAAG T T AAT AAAGAC AAAC AT T C A 
GAATCGCTTTTGGGAGCTAAGTTTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAG 
C AAT TT GT T C C AGAG G GAAGT GAT GT T AC AAC AAAG AAT GAT GGT AAAAT T TAT T T T AAA 
G C AC T T C AAGAT G GT AAC TAT AAAT TAT AT G AAAT T T C AAGT C C AGAT G G C TAT AT AG AG 
GTT AAAACG AAAC CT GTT GT G AC ATT TAG AAT T C AAAAT G GAG AAGT T AC G AAC CT G AAA 
G C AG AT C C AAAT G C T AAT AAAAAT C AAAT C G G GT AT C T T GAAGG AAAT GG T AAAC AT CT T 
ATTACCAACACTCCCAAACGCCCACCAGGTGTTTTTCCTAAAACAGGGGGAATTGGTACA 
ATTGTCTATATATTAGTTGGTTCTACTTTTATGATACTTACCATTTGTTCTTTCCGTCGT 
AAACAATTG 

SEQ ID NO. 4102: SAG0649 FROM 090 GBS TYPE la STRAIN 

G GT G AAAC C C AAG AT AC C AAT C AAG C AC T T G GAAAAG 
TAATTGTTAAAAAAACGGGAGACAATGCTACACCATTAGGCAAAGCGACT 
T T T G T G T T AAAAAAT GAC AAT GAT AAGT C AG AAAC AAGT C ACG AAAC GGT 
AG AGG GT T C T GG AGAAG C AAC C T T T GAAAAC AT AAAAC CT G GAG AC T AC A 
CAT T AAG AG AAG AAAC AGC AC C AAT T G G T TAT AAAAAAAC TG AT AAAAC C 
T GG AAAGT T AAAGT T G C AG AT AACG G AG C AAC AAT AAT C G AGG GT AT GG A 
T G C AGAT AAAG C AG AG AAAC GAAAAG AAGTTTT GAAT G C C C AAT AT C C AA 
AAT C AG CT AT T TAT GAG GAT AC AAAAG AAAAT T AC C CAT T AG T T AAT GT A 
G AGG GT T C C AAAG T T GG T G AAC AAT AC AAAG CAT T G AAT C C AAT AAAT GG 
AAAAGAT GGT CG AAG AG AG AT T GCT GAAGGT T GGT T AT C AAAAAAAAT T A 
C AG GG G T C AAT GAT C T C GAT AAG AAT AAAT AT AAAAT T G AAT T AAC T GT T 
G AGG GT AAAAC C AC T GTT G AAAC G AAAG AAC T T AAT C AAC C AC TAG AT G T 
CGT T GT GCT AT T AGAT AAT T C AAAT AG TAT G AAT AAT G AAAG AGC C AAT A 
ATTCTCAAAGAGCATTAAAAGCTGGGGAAGCAGTTGAAAAGCTGATTGAT 
AAAAT T AC AT C AAAT AAAG AC AAT AG AGT AG C T C T T G T GAC AT AT G C C T C 
AACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATC 



141 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



AAAAT GGT AAAG CG C T G AAT G AT AGT GT AT CAT GGG AT TAT CAT AAAAC T 
AC T T T T AC AG C AAC T AC AC AT AAT T AC AGT T AT T T AAAT TT AAC AAAT GA 
TGCT AACGAAGTT AAT ATT CT AAAGT C AAGAATT CCAAAGGAAGCGGAGC 
ATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACATTTACTCAA 
AAAG C T CT AAT G AAAG CAAAT GAAAT T T T AGAG AC AC AAAGT T C T AAT G C 
TAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTT 
ATGCCATAAATTTTAATCCTTATATATCAACATCTTACCAAAACCAGTTT 
AAT T CT T T T T T AAAT AAAAT AC C AG AT AG AAGT GGT AT T C T C C AAG AGG A 
TT TTAT AAT CAATGGTGAT GAT TAT CAAATAGT AAAAGGAGAT GGAGAGA 
GT T T T AAAC T GT T T T CG GAT AG AAAAGT T C C T GT T AC T GG AG G AACG AC A 
C AAG C AG C T TAT CG AGT AC C G C AAAAT C AACT C T CT GT AAT G AG T AAT G A 
GGGATATGCAATTAATAGTGGATATATTTaTCTCTATTGGAGAGATTACA 
ACT G GGT C T AT C CAT T T GAT C C T AAG AC AAAG AAAG T T T C T GC AAC G AAA 
CAAAT CAAAACT CAT GGTG AGCCAAC AACATT AT ACTTT AAT GG AAAT AT 
AAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAG 
ATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAATCAATATCA 
AGT AAAAC AG AAAAT TAT ACT AAT GT T GAT GAT AC AAAT AAAAT TTAT G A 
T G AG CT AAAT AAAT AC T T T AAAAC AAT T G T T G AGGAAAAAC AT T CT AT T G 
T T GAT G GAAAT GT G ACT GAT CC T AT GGG AG AG AT GAT T G AAT T C C AAT T A 
AAAAAT GGT C AAAGT T T T AC AC AT GAT GAT T AC GtTTTGGtT GG AAAT G A 
tGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAAACAGTGATG 
GGGGAAT T T T AAAAGAT G T T AC AGT G ACT TAT GAT AAG AC AT C T C AAAC C 
AT C AAAAT C AAT CAT T T G AACT T AGG AAGT GG AC AAAAAGT AGT T C T T AC 
CTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATTTTACAATA 
CAAAT AAT CGT AC AACG CT AAGT CCGAAGAGTGAAAAAGAACC AAAT ACT 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGT 
AC T AAC CAT C AGT AAT C AGAAGAAAAT GG GT GAG G T T G AAT T T AT T AAAG 
TTAATAAAGACAAACATTCAGAATCGCTTTTGGGAGCTAAGTTTCAACTT 
C AGAT AG AAAAAG AT T T T T CT GGGT AT AAG C AAT T T GT T CC AG AG G G AAG 
T GAT GT T AC AAC AAAG AAT GAT G GT AAAAT T TAT T T T AAAGC AC T T C AAG 
AT GGT AACT AT AAAT TAT AT GAAAT T T C AAGT CC AG AT GGCT AT AT AGAG 
GT T AAAAC G AAAC C T G T T G T GAC AT T T AC AAT T C AAAAT GG AG AAGT T AC 
G AAC CT G AAAG C AGAT C CAAAT G C T AAT AAAAAT CAAAT C GG GT AT C T T G 
AAGGAAATGGTAAACATCTTATTACCAACACTCCCAAACGCCCACCAGGT 
GTT 

SEQ ID NO. 4103: SA60649 FROM A909 GBS TYPE la STRAIN 

GGT G AAAC C C AAG AT AC C AAT C AAG C ACT T G G AAAA 

GT AAT T GT T AAAAAAACGGG G GAC AAT G CT AC AC CAT TAG G C AAAG C GAC 
TTTTGTGT T AAAAAAT GAC AAT GAT AAGT CAg AAAC AAGT C AC G AAAC GG 
T AG AGGGT T C T G GAGAAg C AAC C T T T GAAAAC AT AAAAC C T GG AG AC T AC 
ACATTAAGAGAAGAAACAGCACCAATTGGTTATAAAAAAACTGATAAAAC 
CTGGAAAGTTAAAGTTGCAGATAACGGAGCAACAATAATCGAGGGTATGG 
AT GC AGAT AAAG C AG AG AAAC GAAAAGAAG T T T T G AAT G C C C AAT AT C C A 
AAAT C AGCT AT T T AT G AGG AT AC AAAAG AAAAT T ACC C AT T Ag T T AAT G T 
AG AG GGT T C C AAAGT T GGT G AAC AAT AC AAAG CAT T GAAT C C AAT AAAT G 
GAAAAGAT G GT C GAAG AG AG AT TGCT G AAG G T T G G T T AT C AAAAAAAAT T 
ACAGGGGT C AAT GATCT CGAT AAGAAT AAAT AT AAAATTG AATT AACTGT 
T G AGG GT AAAAC C AC T GT T GAAACGAAAGAACTT AAT C AAC C ACT AGAT G 
TCGTTGTGC TAT TAG AT AAT T CAAAT AG TAT GAAT AAT G AAAG AG C C AAT 
AAT T C T C AAAG AG CAT T AAAAG C T GG GG AAG C AGT T G AAAAG C T GAT T G A 
T AAAAT TAC AT CAAAT AAAGACAATAGAGT AGCT CT T GTGACATATG CCT 
CAACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGAT 
C AAAAT GGT AAAG C G C T GAAT GAT AGT G TAT CAT GGG AT TAT CAT AAAAC 
TACTTTTACAGCAACTACACATAATTACAGTTATTTAAATTTAACAAATG 
AT GCT AACG AAGT T AAT ATTCTAAAGTCAAGAATTCC AAAG G AAG CGG AG 
CAT AT AAAT G G GG AT C G C ACG C T C T AT C AAT T T G GT G CGAC AT T TAC T C A 
AAAAG C T CT AAT G AAAG CAAAT GAAAT T T TAG AG AC AC AAAGT T C T AAT G 
CTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCT 
TAT G C CAT AAAT TT T AAT CCT TAT AT AT C AAC AT CT TAC C AAAAC CAG T T 
T AAT T C T T T T T T AAAT AAAAT AC CAG AT AG AAGT G G T AT TC T C C AAG AG G 
ATT T T AT AAT C AAT GGT GAT GAT TAT CAAAT AGT AAAAG GAG AT GG AG AG 
AGTTTTAAACTGTTTTCGGATAGAAAAGTTCCTGTTACTGGAGGAACGAC 
ACAAGC AGCT TAT CGAGTACCGCAAAAT C AACT CTCTGTAATGAGT AAT G 
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AGGGAT ATGC AATT AAT AGT GG AT AT AT T TAT CT CT AT T GG AGAGAT T AC 

AACTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCTGCAACGAA 
AC AAAT C AAAAC T C AT GGT G AG C C AAC AAC AT TAT AC T T T AAT GG AAAT A 

TAAGAC CTAAAGGT T AT GACATTTTTACT GT TGGGATTGGTGTAAACGGA 
GAT C C T GGT GC AAC T C CT CT T GAAGCT GAG AAAT T TAT G C AAT C AAT AT C 
AAGT AAAAC AGAAAAT TAT ACT AAT GT T GAT G AT AC AAAT AAAATT T AT G 
ATGAG CT AAAT AAAT AC T T T AAAAC AAT T GT T GAG G AAAAAC AT T CT AT T 
GT T G AT GGAAAT GT G ACT GAT C C T AT GGGAG AG AT GAT T G AAT T C C AAT T 

AAAAAATGGTCAAAGTTTTACACATGATGATTACGtTTTGGtTGGAAATG 
At GG C AGT C AAT T AAAAAAT GGTGTGGCTCTTGGTG G AC C AAAC AGT GAT 
GGGGGAATTT TAAAAGATGTTAC AGTGACTTAT GAT AAGACAT CT C AAAC 
CAT C AAAAT C AAT CAT T T G AACT T AGGAAGT G GAC AAAAAGT AGT T C T T A 
CC T AT G AT GT AC GT T T AAAAGAT AAC TAT AT AAGT AAC AAAT T T T AC AAT 
AC AAAT AAT C GT AC AAC GC T AAGT C CG AAGAGT G AAAAAG AAC C AAAT AC 
TATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGG 
T ACTAAC CAT CAGTAAT CAGAAGAAAATGGGT GAGGTT GAAT TTATT AAA 
GT T AAT AAAGAC AAAC AT T C AG AAT C GC T T T T G GGAG C T AAGT T T C AAC T 
TCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCCAGAGGGAA 
GT GAT GT T AC AAC AAAG AAT GAT G GT AAAAT T TAT T TT AAAGC AC T T C AA 
GAT G GT AACT AT AAAT T AT AT G AAAT TT CAAGT C CAGAT GG CT AT AT AG A 
GGTTAAAACGAAACCTGTTGTGACATTTACAATTCAAAATGGAGAAGTTA 
C G AAC C T GAAAG CAGAT C C AAAT GC T AAT AAAAAT C AAAT C G GG T AT C T T 
G AAG GAAAT GGT AAAC AT CT T AT T AC C AAC AC T C C C AAACGC CC AC C AG G 
TGTT 

SEQ ID NO. 4104: SAG0649 FROM 18RS21 GBS TYPE II STRAIN 

GGT G AAAC C C AAGAT AC C AAT C AAG C AC 

TTGGAAAAGTAATTGTTAAAAAAACGGGAGACAaTGCTACACCaTTAGGC 
AAAG C GAC TTTTGTGT T AAAAAAT GAC AAT GAT AAG T C AG AAAC AAG T C A 
C GAAACG G T AG AGG G T T C T GGAG AAg C AAC CT T T GAAAAC AT AAAAC C T G 
GAGACTACAC ATT AAGAGAAGAAACAGCAC CAAT T GGTTAT AAAAAAACT 
GATAAAACCTGGAAAGTTAAAGTTGCAGATAACGGAGCAACAATAATCGA 
GG GT AT GGAT G CAGAT AAAG C AG AGAAAC G AAa AG AAG T T T T GAAT G C C C 
AAT AT C C AAAAT CAG C T AT T TAT GAG GAT AC AAAAG AAAAT T AC C CAT T A 
GT T AAT GT AG AGGG T T C C AAAGT T GGT GAAC AAT AC AAAGC AT T GAAT C C 
AAT AAAT G G AAAAG AT GGT C G AAGAG AG AT T G C T GAAG GT T GGT TAT C AA 
AAAAAAT TaCaGGGGT CAAT GAT CT C GAT AAG AAT AAAT AT AAAAT T G AA 
TTAACTGTTGAGGGTAAAACCACTGTTGAAACGAAAGAACTTAATCAACC 
AC T AGAT GTCGTTGTGC TAT TAG AT AAT T C AAAT AGT AT GAAT AAT G AAA 
GAGC CAAT AAT T CT C AAAG AG CAT T AAAAG C T G GG GAAG C AGT T GAAAAG 
C T GAT T GAT AAAAT T AC AT C AAAT AAAGAC AAT AG AGT AG CT C T T GT G AC 
AT AT G C CT C AAC CAT T T T T GAT G GT AC T G AAGC G AC C GT AT C AAAGGGAG 
TTGCCGATCAAAATGGTAAAGCGCTGAATGATAGTGTATCATGGGATTAT 
CAT AAAAC TACT TT T AC AG C AACT ACAC AT AAT T AC AG T TAT T T AAAT T T 
AAC AAAT GAT G CT AACG AAGT T AAT AT T C T AAAGT C AAG AAT T C C AAAG G 
AAG C G GAG CAT AT AAAT GG GG AT CG C AC G C T C T AT CAAT T T GGT G CG AC A 
T TT ACT C AAAAAG C T C T AAT GAAAG C AAAT GAAAT T T TAG AG AC AC AAAG 
TTCTAATGCTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTA 
C G AT GT CT T AT G C CAT AAAT T T T AAT C CT T AT AT AT C AAC AT C T T AC C AA 
AAC C AGT T T AAT T CT T T T T T AAAT AAAAT AC CAG AT AG AAGT GGT AT T C T 
CCAAGAGGATTTT AT AATC AAT GGT GATGATT AT CAAAT AGT AAAAGGAG 
AT GGAG AG AG T T T T AAAC T GT T T T C GG AT AG AAAAGT T CCTGTTACTG G A 
GGAAC G AC AC AAG C AGC T TAT C GAG T AC C G C AAAAT C AAC T C T C T GT AAT 
GAGTAATGAGGGATATGCAATTAATAGTGGATATATTTATCTCTATTGGA 
GAGATTACAACTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCT 
G C AACG AAAC AAAT C AAAAC T CAT G GT G AG C C AAC AAC AT TAT AC T T T AA 
T G G AAAT AT AAG AC CT AAAGG T TAT GAC AT T T T T AC T GT T G GGAT T G GT G 
TAAACGGAGATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAA 
T CAAT AT CAAGT AAAAC AGAAAAT TAT ACT AAT GT TGATGAT ACAAAT AA 
AAT T TAT GAT GAG C T AAAT AAAT AC T TT AAAAC AAT TGTT G AGG AAAAAC 
AT T C TAT TGTT GAT G GAAAT GT GAC T GAT C C T AT G G GAG AG AT GAT T GAA 
T T C CAAT T AAAAAAT GGT C AAAG T T T TAG AC AT GAT G AT T AC GT T T T G GT 
TGGAAATGATGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAA 
AC AGTG AT GGG G G AAT T T T AAAAG AT GT T AC AG T GAC T T AT GAT AAG AC A 
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T CT C AAAC CAT C AAAAT C AAT CAT T T GAACT T AG G AAGT G GAC AAAAAGT 
AGTTCTTACCTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAAT 
T T T ACAAT AC AAAT AAT C GT AC AAC G CT AAGT C C G AAGAGT G AAAAAGAA 

CCAAATACTATtcGtgATTtCCCAATTCCCAAAATTCGTGATGTTCGTGA 
GT T TC C G GT AC T AAC CAT C AGT AAT CAGAAG AAAAT GGGT GAG GT T GAAT 

TTATTAAAGTTAATAAAGACAAACATTCAGAATCGCTTTTGGGAGCTAAG 
TTTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCC 
AGAG G GAAGT GAT GT T AC AAC AAAG AAT G AT GGT AAAAT T TAT T T T AAAG 
C ACT T C AAGAT G G T AACT AT AAAT TAT AT GAAAT T T C AAGT C C AG AT G G C 
T AT AT AG AG GT T AAAAC GAAAC CT GT T GT GAC AT T TAG AAT T C AAAAT G G 
AG AAGT T ACG AAC C T GAAAG C AG AT C C AAAT G CT AAT AAAAAT C AAAT C G 
GGT AT C T T GAAG GAAAT G GT AAAC AT C T TAT T AC C AAC AC T C C C AAAC G C 
C C AC C AG GT GT T 

SEQ ID NO. 4105: SAG0649 FROM M732 GBS TYPE III STRAIN 

GGT GAAAC C C AAGAT AC C AAT C AAGC ACT 

TGGAAAAGTAATTGTTAAAAAAACGGGAGACAaTGCTACACCATTAGGCA 
AAG C GAC TTTTGTGT T AAAAAAT GAC AAT GAT AAGT C AG AAAC AAG T C AC 
G AAACGGT AGAGGGT T C T G GAG AAGC AAC CT T T GAAAAC AT AAAAC CT GG 
AG AC T AC AC AT T AAGAG AAGAAAC AG C AC CAAT T GGT T AT AAAAAAACT G 
AT AAAAC C T G GAAAG T T AAAGT T G C AGAT AAC G GAGC AAC AAT AAT C GAG 
GGT AT GG AT G C AG AT AAAG C AGAG AAACG AAAAGAAG T T T T GAAT GC C C A 
ATATCCAAAATCAGCTATTTATGAGGATACAAAAGAAAATTACCCATTAg 
T T AAT GT AGAGG GT T C C AAAGT T GGT G AAC AAT AC AAAG CAT T GAAT CCA 
AT AAAT GG AAAAG AT G GT C GAAG AGA GAT T GC T GAAG GT T GGT T AT C AAA 
AAAAAa T a C a GGGGT CAAT GAT CT C GAT AAG AAT AAAT AT AAAAT T GAAT 
T AAC T GT T GAG G G T AAAAC C AC T GT T GAAAC GAAAG AAC T T AAT C AAC C A 
CT AGAT GTCGTTGT G CT AT TAG AT AAT T C AAAT AG TAT GAAT AAT GAAAG 
AG C CAAT AAT T C T C AAAG AGC AT T AAAa G C T GG GG AAG C AG T T G AAAAG C 
TGATT GAT AAAATT ACAT C AAAT AAAGAC AAT AG AG T AG CT C T T GT GACA 
TAT G C C T C AAC CAT T T T T GAT GGT AC T G AAG CG ACC GT AT C AAAG G G AGT 
T GC CGAT C AAAAT G GT AAAG C GC T GAAT GAT AG T GT AT CAT GGG AT TAT C 
AT AAAAC T AC T T T T AC AG CAACT AC AC AT AAT T AC AGT TAT T T AAAT T T A 
AC AAAT GAT G C T AAC GAAGT T AAT AT T C T AAAGT C AAGAAT T C C AAAG G A 
AGCGGAGCATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACAT 
T T AC T C AAAAAGC T C T AAT G AAAGC AAAT GAAAT T T T AG AG AC ACAAAG T 

TCTAATGCTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTAC 
GAT GT C T TAT G C CAT AAAT T T T AAT C CTT AT AT AT C AAC AT CT T AC C AAA 
AC C AGT T T AAT T C T T T T T T AAAT AAAAT AC C AG AT AG AAG T GGT AT T CT C 
C AAGAG GAT T T TAT AAT CAAT GGT GAT GAT TAT C AAAT AG T AAAAGG AGA 
T GG AGAG AGT T T T AAAC T GT T T T CG G AT AG AAAAGT T C CT GT T AC T G GAG 
GAAC G AC AC AAG C AG C T TAT CG AGT AC C G C AAAAT C AAC T C T C T GT AAT G 
AG T AAT G AGGG AT AT G C AAT T AAT AGT GG AT AT AT T T AT C T C T AT T GG AG 
AG AT TAG AAC T GGGTCT AT C CAT T T GAT C CT AAG AC AAAG AAAGT T T CT G 
C AAC GAAAC AAAT C AAAACT C AT G GT G AG C CAAC AAC AT TAT AC T T T AAT 
GG AAAT AT AAG AC C T AAAG G T TAT GAC AT TT T TACT GT T G G GAT T GGT G T 
AAACGGAGATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAAT 
CAAT AT C AAGT AAAAC AGAAAAT TAT AC T AAT GT T GAT GAT AC AAAT AAA 
AT T T AT GAT GAG C T AAAT AAAT AC T T T AAAAC AAT T GT T G AGG AAAAAC A 
T T C T AT T G T T G AT GG AAAT GT G AC T GAT C C TAT G GG AG AG AT GAT T GAAT 
T C CAAT T AAAAAAT GGT C AAAG T T T T AC AC AT GAT G AT T AC G tTTTGGtT 
GGAAATGAtGGCAGT CAAT TAAAAAATGGTGTGGCT CTT GGT GG ACC AAA 
C AGT G AT GGGG G AAT T T T AAAAG AT GT T AC AGT GAC T T AT GAT AAG AC AT 
CT C AAAC CAT C AAAAT CAAT CAT T T G AAC T TAG GAAGT G GAC AAAAAGT A 
GTTCTTACCTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATT 
T TAG AAT AC AAAT AAT CGT AC Aa C G C T AAGT C CG AAG AGT GAAAAAG AAC 
CAAATACTATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAG 
TTTCCGGTACTAACCATCAGTAATCAGAAGAAAATGGGTGAGGTTGAATT 
TAT T AAAG T T AAT AAAG AC AAAC AT T C AG AAT CGCTTTTGG GAG C T AAGT 
TTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCCA 
G AGG GAAGT GAT GT T AC AAC AAAG AAT GAT GGT AAAAT T TAT T T T AAAG C 
AC T T C AAGAT G G T AACT AT AAAT TAT AT GAAAT T T C AAGT C C AG AT GG C T 
ATATAGAGGTTAAAACGAAACCTGTTGTGACATTTACAATTCAAAATGGA 
GAAGT TAG GAAC CT G AAAGC AG AT C C AAAT G C T AAT AAAAAT C AAAT C G G 
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GTATCTTGAAGGAAATGGTAAACATCTTATTACCAACACTCCCAAACGCC 
CACCAGGTGTT 

SEQ ID NO. 4106: SAG0649 FROM COH1 GBS TYPE III STRAIN 

GGT GAAACCCAAGAT AC CAAT CAAGCACTTGGAAAAG 
T AAT T GT T AAAAAAACG G GAG AC Aa T G C T AC AC C AT T AGG C AAAGC GAC T 
TTTGT GTTAAAAAATGACAATGAT AAGT CAGAAACAAGT CACGAAACGGT 
AGAGG GT T CT GG A r AAG C AACC T T T GAAAAC AT AAAAC C T G GAG ACT AC A 
C AT T AAG AG AAG AAAC AG C AC CAAT T GGT T AT AAAAAAAC T GAT AAAAC C 
T GGAAAGT TAAAGT T GC AG AT AACGGAGCAAC AAT AAT C GAG GGT AT GGA 
TG C AGAT AAAG C AG AGAAACG AAAAG AAGT T T T GAAT G C C CAAT AT C C AA 
AATCAGCTATTTATGAGGATACAAAAGAAAATTACCCATTAgTTAATGTA 
GAGGGTT C CAAAGTT GGTGAAC AAT a CAAAGCAT TGAAT C CAATAAATGG 
AAAAG AT GGT CGAAGAGAGATT GCT GAAGGTTGGT T AT C AAAAAAAAAT A 
CAGGGGTCAATGATCTCgATAAGAATAAATATAAAA-TTGAATTAACTGTT 
GAG G GT AAAAC C ACT G T T GAAAC GAAAG AAC T T AAT C AAC C ACT AG AT GT 
C GT T G T G CT AT T AGAT AAT T C AAAT AGT AT GAAT AAT G AAAG AG C CAAT A 
ATT CT CAAAGAGCATT AAAAGCTGGGGAAGCAGT TGAAAAGCTGAT T GAT 
AAAAT T ACAT C AAAT AAAG AC AAT AG AG T AG CT C T T GT GAC AT AT G C CT C 
AACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATC 
AAAAT GGT AAAG C G C T GAAT GAT AG T GT AT CAT GG G AT TAT CAT AAAAC T 
ACTTTTACAGCAACTACACATAATTACAGTTATTTAAATTTAACAAATGA 
T G C T AAC G AAGT T AAT AT T C TAAAGT C AAG AAT T C C AAAGG AAG C G GAG C 
ATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACATTTACTCAA 
AAAGCT CT AAT GAAAG C AAAT G AAAT T T TAG AG AC AC AAAGT T C T AAT G C 
TAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTT 
AT G C CAT AAAT T T T AAT C CT T AT AT AT C AAC AT C T T AC C AAAAC C AGT T T 
AATTCTTTTTTAAATAAAATACCAGATAGAAGTGGTATTCTCCAAGAGGA 
TT T TAT AAT CAAT GGT GAT GAT TAT CAAAT AGT AAAAGG AGAT GGAGAGA 
GTTTTAAACTGTTTTCGGATAGAAAAGTTCCTGTTACTGGAGGAACGACA 
CAAGCAGCT TAT CGAGTAC CGC AAAAT CAACT CT CT GTAATGAGTAATGA 
G GGAT AT G CAAT T AAT AGT G GAT AT AT T TAT C T C T AT T G GAG AG AT T AC A 
ACT GGGT C TAT C CAT T T GAT C C T AAG AC AAAG AAAGT T T CT G C AAC G AAA 
CAAAT C AAAACT CAT GGT GAG C C AAC AAC AT TAT AC T T T AAT G GAAAT AT 
AAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAG 
AT C C T GGT G C AAC T C CT C TT GAAGC T GAG AAAT T TAT GC AAT CAAT AT C A 
AGT AAAAC AG AAAAT TAT ACT AAT G T T GAT GAT AC AAAT AAAAT T TAT GA 
T GAGCTAAATAAATACTTT AAAACAATT GTTGAGGAAAAAC AT T CT AT T G 
T T G AT GG AAAT GT GAC T GAT C CT AT GGG AGAG AT GAT T GAAT T C CAAT T A 
AAAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGTTGGAAATGA 
TGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAAACAGTGATG 
GGGGAAT TTT AAAAGAT GT T ACAGT GACT T ATGAT AAGACAT CT C AAACC 
AT C AAAAT CAAT CAT T T G AAC T T AGGAAGT GG AC AAAAAGT AGT T C T T AC 
CTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATTTTACAATA 
CAAAT AAT C GT AC AAC G CT AAGT C CG AAGAGT G AAAAAG AAC CAAAT AC T 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGT 
ACTAACCATC AGT AAT C AGAAGAAAAT GGGT GAGGTT GAAT T TAT T AAAG 
T T AAT AAAG AC AAAC AT T C AgAAT CG C T T T T G GG AG C T AAGT T T C AACT T 
C AG AT AG AAAAAG AT T T T T CT GG GT AT AAGC AAT T T GT T C C AG AGG GAAG 
T GAT GT T AC AAC AAAG AAT GAT GGT AAAAT T T AT T T T AAAG C ACT T C AAG 
AT G G T AAC TAT AAAT TAT AT GAAAT T T C AAG T C C Ag AT G G C T AT AT AG AG 
GT T AAAAC G AAAC C T GT T GT G AC AT T T AC AAT T C AAAAT GGAGAAGT T AC 
G AAC CT G AAAG C AG AT C CAAAT G CT AAT AAAAAT CAAAT C G GG T AT C T T G 
AAGG AAAT GGT AAAC AT CT T AT T AC C AAC ACT C C C AAAC G C C C AC C AG G T 
GTT 

SEQ ID NO. 4107: SAG0649 FROM M781 GBS TYPE III STRAIN 

T T G G AAAAGT AAT T GT T AAAAAAAC GGG AG AC AC T G CT AC AC CAT T AGG C > 
AAAGC GAC T T T TGT GT T AAAAAATG AC AAT GAT AAGT CAGAAACAAGT C A 
CGAAACGGTAGAGGGTTCTGGAAAAGC AAC CTTT GAAAAC AT AAAAC CTG 
GAGACTACACATTAAGAGAAGAAACAGCACCAATTGGTTATAAAAAAACT 
GAT AAAAC CTG G AAAGT TAAAGT T G C AG AT AAC GG AG C AmC AAT AAT C G A 
GG G T AT G GAT G C AG AT AAAG C AG AG AAAC G AAAAG AAG T T T T GAAT G C C C 
AAT AT C C AAAAT C AGCT AT T T AT GAG GAT AC AAAAG AAAAT T AC CC ATT A 



145 



WO 2004/018646 PCT/US2003/026827 



SEQUENCE LISTING 



g T T AAT GT AG AGGGT T C C AAAG T T GGT GAAC AAT AC AAAGC AT T G AAT C C 
AAT AAAT GGAAAAG AT GGT C gAAG AG AG AT T G CT G AAGGT T G GT TAT C AA 
AAAAAAT TAG a GG GGT C AAT GAT C T CG AT AAG AAT AAAT AT AAAATT G AA 
T T AACT GT T GAG G GT AAAAC C ACT GT T G AAAC g AAAG AACT T AAT C AAC C 
ACTAGATGTCGTTGTGCTATTAGATAATTCAAATAGTATGAATAATGAAA 
GAG C C AAT AAT T C T C AAAG AGC AT T AAAAG C T GGGG AAGC AG T T GAAAAG 
C T GAT T G AT AAAAT T AC AT C AAAT AAAG AC AAT AGAG TAGC T C T TGT GAC 
AT AT G C C T C AAC CAT T T T T GAT GGT AC T GAAG C G AC CGT AT C AAAGG G AG 
T T GC C GAT C AAAAT GGT AAAGC GCT G AAT G AT AGT GT AT CAT GG GAT TAT 
C AT AAAACT ACT T T T AC AG C AACT AC AC AT AAT T AC AGT T AT T T AAAT T T 
AAC AAAT GAT G C T AACGAAG T T AAT AT T C T AAAGT C AAG AAT T C C AAAGG 
AAG C GG AGC AT AT AAAT G G G G AT CGC AC G C T C TAT C AAT T T GGT G CGAC A 
T T T AC T C AAAAAG CT C T AAT G AAAGC AAAT GAAAT T T T AG AG ACAC AAAG 
TTCTAATGCTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTA 
C GAT GT C T T AT G C CAT AAAT T T T AAT C C T TAT AT AT C AAC AT C T T AC C AA 
AAC CAGTT T AAT T CTT TT T T AAAT AAAAT ACCAGATAGAAGTGGT ATT CT 
C C AAG AG GAT T T TAT AAT C AAT GGT GAT GAT TAT C AAAT AGT AAAAGG AG 
ATGGAGAGAGTTTTAAACTGTTTTCGGATAGAAAAGTTCCTGTTACTGGA 
G GAAC GAC AC AAGC AG CT TAT CGAGT AC C G C AAAAT C AAC T CT C T G T AAT 
GAGT AAT GAGGGAT AT GCAATT AAT AGT GGATATATTT AT CT CT At TGGA 
GAG AT T AC AACT GGG T C T AT C CAT T T GAT C CT AAG AC AAAG AAAGT T T CT 
G C AAC G AAACAAAT C AAAAC T C AT GG T GAG C C AAC AAC AT TAT AC T T T AA 
T G GAAAT AT AAGAC CT AAAGGT T AT GAC AT T T T T AC T GT T G GGAT T G GT G 
T AAACGG AGAT C C T GG T G C AAC T C C T C T T GAAG C T GAG AAAT T T ATGC AA 
TC AAT AT CAAGT AAAAC AG AAAAT TAT ACT AAT GTTGATG AT ACAAATAA 
AAT T TAT G AT G AGC T AAAT AAAT AC T T T AAAAC AAT T GT T G AGGAAAAAC 
AT T C T AT T G T T GAT GG AAAT GT GAC T GAT C C TAT G G GAG AG AT GAT T GAA 
TTCCAATTAAAAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGT 
TGGAAATGATGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAA 
AC AG T GAT G GG GG AAT T T T AAAAG AT GTT AC AGT GAC T T AT GAT AAG AC A 
T CT C AAAC CAT C AAAAT C AAT CAT T T GAAC T TAG G AAGT G G AC AAAAAGT 
AGTTCTTACCTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAAT 
T T T AC AAT AC AAAT AAT C G T AC AACG C T AAGT C C GAAG AGT GAAAAAGAA 
CCAAATACTATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGA 
G T T T C CGG T AC T AAC CAT C AGT AAT C AG AAG AAAAT GGGT GAGGT T G AAT 
T TAT T AAAGT T AAT AAAGAC AAAC AT T C AG AAT C G C T T T T GG G AG C T AAG 
TTTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCC 
AG AGGG AAGT GAT GT T AC AAC AAAG AAT GAT G GT AAAAT T T AT T T T AAAG 
C AC T T C AAG AT GGT AAC TAT AAAT TAT AT GAAAT T T C AAG T C C AG AT GG C 
TAT AT AG AG GTT AAAAC G AAAC CT GT T G T GAC AT T T AC AAT T C AAAAT GG 
AG AAGT T ACG AAC CT GAAAG C AG AT C C AAAT G C T AAT AAAAAT C AAAT C G 
G GT AT CTT GAAG GAAAT GGT AAAC AT C T TAT T AC C AAC ACT C C C AAAC G C 
CCACCAGGTGTT 

SEQ ID NO. 4108: SAGO 64 9 FROM CJB GBS NONT Y PE ABLE STRAIN 

GGT G AAAC C C AAGAT AC C AAT C AAG C AC T T G GAAAAGT 
AAT TGT T AAAAAAAC G G GAGAC Aa T G C T AC AC CAT T AGGC AAAG C GAC T T 
T T G T GT T AAAAAAT GAC AAT GAT AAGT C AG AAAC AAGT C AC G AAAC G GT A 
GAG G GT T CT GGAr AAGC AAC C T T T G AAAAC AT AAAAC C T G G AGACT AC AC 
AT T AAGAG AAGAAAC AG C AC C AAT T G GT TAT AAAAAAACT GAT AAAAC C T 
G GAAAG T T AAAGT T GC AG AT AAC G GAG C AAC AAT AAT C G AGG G T AT GGAT 
G C AGAT AAAG C AGAG AAAC GAAAAG AAG T T T T G AAT G C C C AAT AT C C AAA 
AT C AG C TAT T TAT GAG GAT AC AAAAG AAAAT T AC C C AT T Ag T T AAT GT AG 
AGGGT T C C AAAG T T GG T GAAC AAT AC AAAG CAT T G AAT C C AAT AAAT GG A 
AAAG AT G GT C G AAGAG AG AT T G C T GAAG G T T G G T TAT C AAAAAAAAT T AC 
a G G G GT C AAT GAT C T C GAT AAG AAT AAAT AT AAAAT T G AAT T AAC T GT T G 
AGG G T AAAAC C ACT GTT G AAAC GAAAG AAC T T AAT C AAC C AC TAG AT G T C 

GTTGTGCTATTAgATAATTCAAATAGTATGAATAATGAAAGAGCCAATAA 
T T C T C AAAG AG CAT T AAAAG C T G G G GAAG C AG T T GAAAAG C T GAT T GAT A 
AAAT T AC AT C AAAT AAAGAC AAT AG AGT AG C T C T T GT GAC AT AT G C CT C A 
ACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATCA 
AAAT G GT AAAG C G C T GAAT GAT AGT G T AT CAT G GGAT TAT CAT AAAAC T A 
CTT T TAG AG C AACT AC AC AT AAT T AC AGT TAT T T AAAT T T AAC AAAT GAT 
GCTAACGAAGTTAATATTCTAAAGTCAAGAATTCCAAAGGAAGCGGAGCA 
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T AT AAAT G G GG AT C G CAC G CT CT AT C AAT T T GGT G C GAC AT T T AC T C AAA 
AAG CT CT AAT GAAAG C AAAT G AAAT T T T AGAGAC AC AAAGT T CT AAT G C T 
AGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTTA 
T G C CAT AAAT T T T AAT C C T TAT AT AT C AAC AT C T T AC C AAAAC C AGT T T A 
AT T C T T T T T T AAAT AAAAT AC C AG AT AG AAGT G GT AT T C T C C AAGAGGAT 
T T TAT AAT C AAT GGT GAT GAT TAT C AAAT AGT AAAAG GAG AT GG AGAG AG 
T T T T AAAC T GT T T T C GG AT AG AAAAGT T C C T GT T ACT GG AGGAAC GAC AC 
AAG C AG C T TAT CG AG T AC C G C AAAAT C AACT C T CT GT AAT G AGT AAT GAG 
GG AT AT G C AAT T AAT AGT GG AT AT AT T TAT C T C T AT T G G AG AGAT T AC AA 
CTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCTGCAACGAAAC 
AAAT C AAAAC T CAT G GT G AG C C AAC AAC AT TAT ACT T T AAT GG AAAT AT A 
AG AC C T AAAGGT TAT GAC AT T T T T AC T GT T GGGAT T GG T GT AAAC GG AG A 
T C C T G GT GC AAC T C CT C T TGAAGCT GAG AAAT T TAT G C AAT C AAT AT C AA 
GT AAAAC AGAAAAT TAT ACT AATGT T GAT GATACAAATAAAATTTATGAT 
GAG C T AAAT AAAT ACT T T AAAAC AAT T GT TG AG G AAAAAC AT T C T AT TGT 
T GAT G GAAAT GT G AC T GAT C C TAT G G GAG AG AT GAT T GAAT T C C AAT T AA 
AAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGTTGGAAATGAt 
GGC AGT C AAT TAAAAAATGGTGTGGCTCTTGGTGGACC AAAC AGT GATGG 
GGGAATTTTAAAAGATGTTACAGTGACTTATGATAAGACATCTCAAACCA 
T C AAAAT C AAT CAT T T G AAC T T AGG AAGT G GAC AAAAAGT AGT T CT T AC C 
TAT GAT GT AC GT T T AAAAG AT AAC TAT AT AAG T AAC AAAT TT T AC AAT AC 
AAAT AAT C GT AC AAC G CT AAG T C C G AAGAGT G AAAAAGAAC C AAAT ACT A 
TTCGTGATTTCCCAATtCCCAAAATTCGTGATGTTCGTGAGTTTCCGGTA 
C T AAC CAT C AGT AAT CAGAAG AAAAT G GGT G AG GT T GAAT T T AT T AAAGT 
T AAT AAAG AC AAAC AT T C AG AAT C GC T T T T GGG AG C T AAGT T T C AAC T T C 
AG AT AGAAAAAG AT T T T T CT GGGT AT AAG C AAT T T G T T C C AG AG GG AAGT 

GATGTTACAACAAAGAATGATGGTAAAATTTATTTTAAAGCACTTCAAGA 
T G G T AAC TAT AAAT TAT AT GAAAT T T C AAG T C C AGAT GG C TAT AT AG AG G 
T T AAAACG AAAC C TGT T GT G AC AT T T AC AAT T C Aa AAT GG AG AAG T T AC G 
AAC C T G AAAG C AG AT C C AAAT G C T AAT AAAAAT C AAAT C G G GT AT C T T G A 
AGGAAAT GGT AAAC AT CT T AT T AC C AAC AC T C C C AAAC G C C CAC C AG GT G 
TT 

SEQ ID NO. 4109: SAG0649 PROM JM9130013 GBS TYPE VIII STRAIN 

GGT G AAAC C C AAGAT AC C AAT CAAG C AC T T G G AAAAG 
T AAT TGT T AAAAAAAC GGG AG AC AAT GC T AC AC C AT TAG G C AAAG C GAC T 
T T T GT GT T AAAAAAT GAC AAT GAT AAGT C AGAAAC AAG T CAC G AAAC GGT 
AG AGG GT T C T G GAG AAG C AAC C T T T GAAAAC AT AAAAC C T G GAG AC T AC A 
CAT T AAGAG AAGAAAC AG C AC C AAT T G G T T AT AAAAAAACT GAT AAAAC C 
TGGAAAGTTAAAGTTGCAGATAACGGAGCAACAATAATCGAGGGTATGGA 
T G C AG AT AAAG C AGAG AAAC GAAAAG AAG T T T T GAAT G C C C AAT AT C C AA 
AAT C AG C T AT T TAT GAGGAT AC AAAAG AAAAT T AC C CAT T AGT T AAT GT A 
GAG G GT T C C AAAGT T GGT GAAC AAT AC AAAG CAT T GAAT C C AAT AAAT GG 

AAAAGATGGTCGAAGAGAGATTGCTGAAGGTTGGTTATCAAAAAAAATTA 
C AGGGGT C AAT G AT CT CG AT AAG AAT AAAT AT AAAAT T GAAT T AAC T GT T 
GAG G GT AAAAC C ACT G T T GAAAC GAAAG AAC T T AAT C AAC CAC TAG AT GT 
CGT T GT G CT AT T AGAT AAT T C AAAT AG TAT GAAT AAT G AAAG AGC C AAT A 
AT T C T C AAAGAG CAT T AAAAG CT GGGGAAG C AGT T GAAAAG CT G AT T GAT 
AAAAT T AC AT C AAAT AAAGAC AAT AGAGT AG C T C T T G T GAC AT AT G C C T C 
AACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATC 
AAAAT GGT AAAG C GC T GAAT GAT AGT GT AT CAT GGG ATT AT C AT AAAACT 
AC T T T T AC AG C AAC T AC AC AT AAT T AC AGT TAT T T AAAT T T AAC AAAT G A 
T G C T AAC G AAGT T AAT AT T C T AAAGT CAAG AAT T C C AAAG GAAGC GG AGC 

ATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACATTTACTCAA 
AAAG C T C T AAT G AAAG C AAAT GAAAT T T T AG AGAC AC AAAG T T CT AAT GC 

TAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTT 
AT G C CAT AAAT T T T AAT C C T TAT AT AT C AAC AT CT T AC C AAAAC C AGT T T 
AAT T C T T T T T T AAAT AAAAT AC C AG AT AG AAGT GG TAT T C T C CAAG AG G A 
T T T TAT AAT C AAT GGT GAT GAT TAT C AAAT AG T AAAAG GAG AT G GAG AG A 
GTTTTAAACTGTTTTCGGATAGAAAAGTTCCTGTTACTGGAGGAACGACA 
CAAGCAGCTTATCGAGTACCGCAAAATCAACTCTCTGTAATGAGTAATGA 
GGG AT AT G C AAT T AAT AGT G GAT AT AT T T AT C T C T AT T GG AG AG AT T AC A 

ACTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCTGCAACGAAA 
C AAAT C AAAAC T CAT G GT G AG C C AAC AAC AT TAT ACT T T AAT GG AAAT AT 
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AAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAG 
ATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAATCAATATCA 
AG T AAAAC AGAAAAT TAT ACT AAT G T T GAT G AT AC AAAT AAAAT T TAT GA 
T GAG C T AAAT AAAT ACT T T AAAAC AAT T GT T GAG GAAAAAC AT T CT AT T G 
T T GAT GG AAAT GT G AC T GAT CC T AT G G G AG AGAT GAT T GAAT T CC AAT T A 
AAAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGTTGGAAATGA 
T GG C AG T C AAT T AAAAAAT GGTGTGGCTCTTGGTG G ACC AAAC AGT G AT G 
GGG GAAT T TT AAAAGAT GT T AC AGT G AC T TAT G AT AAGAC AT C T C AAAC C 
ATCAAAATCAATCATTTGAACTTAGGAAGTGGACAAAAAGTAGTTCTTAC 
CTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATTTTACAATA 
CAAATAATCGTACAACGCTAAGTCCGAAGAGTGAAAAAGAACCAAATACT 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGT 
AC T AAC CAT C AGT AAT C AAAAGAAAAT G GGT GAGGT T GAAT T T AT T AAAG 
T T AAT AAAG AC AAAC AT T C AGAAT C G CT T T T G GGAG C T AAG T T T C AACT T 
C AG AT AAAAAAAG AT T T T T CT GGGT AT AAG C AAT T T G T T C C AG AGG G AAG 
T GAT GT TACAACAAAGAAT GAT GGT AAAATTTATTTT AAAG CACTT CAAG 
AT GG T AACT AT AAAT TAT AT GAAAT T T C AAGT C C AG AT G GC TAT AT AG AG 
GT T AAAAC G AAAC CT GT T GT G AC AT T T AC AAT T C AAAAT G G AG AAGT T AC 
GAAC C T G AAAG C AG AT C C AAAT GC T AAT AAAAAT C AAAT C G GGT AT C T T G 
AA 

SEQ ID NO. 4110: SAG0649 FROM 2603 V/R GBS TYPE V STRAIN 

MKKRQKI WRGLSVTLL I LSQI PFGI LVQGETQDTNQALGKVI VKKTGDNAT PLGKAT FVL 
KNDNDKSETSHETVEGSGEATFENIKPGDYTLREETAPIGYKKTDKTWKVKVADNGATII 
EGMDADKAEKRKEVLNAQYPKSAIYEDTKENYPLVNVEGSKVGEQYKALNPINGKDGRRE 
IAEGWLSKKITGVNDLDKNKYKIELTVEGKTTVETKELNQPLDVVVLLDNSNSMNNERAN 
NSQRALKAGEAVEKLIDKITSNKDNRVALVTYASTIFDGTEATVSKGVADQNGKALNDSV 
SWDYHKTTFTATTHNYSYLNLTNDANEVNILKSRIPKEAEHINGDRTLYQFGATFTQKAL 
MKANEILETQSSNARKKLIFHVTDGVPTMSYAINFNPYISTSYQNQFNSFLNKIPDRSGI 
LQEDFIINGDDYQIVKGDGESFKLFS DRKVPVTGGTTQAAYRVPQNQLSVMSNEGYAINS 
GYIYLYWRDYNWVYPFDPKTKKVSATKQIKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNG 
DPGATPLEAEKFMQSISSKTENYTNVDDTNKIYDELNKYFKTIVEEKHSIVDGNVTDPMG 
EMI E FQLKNGQS FTH D DYVLVGN DG S QLKNGVALGG PN S DGG I LK DVT VT Y DKT S QT I K I 

NHLNLGSGQKWLTYDVRLKDNYISNKFYNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVR 
EFPVLTISNQKKMGEVEFIKVNKDKHSESLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKN 
DGK I Y FKALQDGN YKL YE I S S P DG Y I E VKT KPWT FT I QNGE VTNLKAD PNANKNQ I G YL 

EGNGKHLITNTPKRPPGVFPKTGGIGTIVYILVGSTFMILTICSFRRKQL 

SEQ ID NO. 4111: SAG0649 FROM 090 GBS TYPE la STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGEATFENIKPG 
DYT LREETAP IG YKKT DKT WKVKVADNGAT 1 1 EGMDADKAE KRKEVLN AQYPKS AI YE DT 

KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NI LKSRI PKEAEHINGDRTLYQFGAT FTQKALMKANE I LETQS SNARKKLI FHVT DGVPT 

MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLS PKSEKE PNT IRDFP I PKIRDVRE FPVLT I SNQKKMGE VE FIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWT FTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4112: SAG0649 FROM A909 GBS TYPE la STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKAT FVLKNDNDKSETSHETVEGSGEATFENIKPG 
DYTLREETAPIGYKKTDKTWKVKVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDWVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANE I LETQS SNARKKLI FHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
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TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 

KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKF 

YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 

SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4113: SAG0649 PROM 18RS21 GBS TYPE II STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGEATFENIKPG 
D YT LRE E TAP I G YKKT DKT WKVKVADNG AT HE GM D ADKAE KRKE V LN AQ Y PKSAIYEDT 

KENYPLVNVEGSPCVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVVVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGE PTTLYFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS I S SKTENYTNVDD 

TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4114: SAG0649 FROM M732 GBS TYPE III STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGEAT FENIKPG 
DYTLREETAPIGYKKTDKTWKVKVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKNTGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVVVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGE PTTLYFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS IS SKTENYTNVDD 
T NK I Y D E LN K Y FKT I VE E KH S I V D GN VT D PMGEM I E FQLKN G Q S FT H D D Y VL VGN D G S Q L 

KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWT FT I QNGE VTN LKAD PN ANKNQ I G YLEGNGKHL I TNT PKR P PG V 

SEQ ID NO. 4115: SAG0649 FROM COH1 GBS TYPE III STRAIN 

GE T QDTNQ ALGKVI VKKTGDN AT PLGKATFVLKNDNDKSETSHETVEGSGX AT FENIKPG 
D YT LREE T AP I G YKKT DKTWKVKVADNGAT HE GMDADKAE KRKE VLNAQY PKS AI YE DT 

KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKNTGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGE PTTLYFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS IS SKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKF 
YNTNNRTTLS PKSEKE PNTIRDFP I PKIRDVRE FPVLT I SNQKKMGE VE FIKVNKDKHSE 

SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWT FT I QNGE VTNLKADPNANKNQIG YLEGNGKHL ITNT PKRP PGV 

SEQ ID NO. 4115: SAG0649 FROM M781 GBS TYPE III STRAIN 

GKVIVKKTGDTATPLGKATFVLKNDNDKSETSHETVEGSGKATFENIKPGDYTLREETAP, 
IGYKKTDKTWKVKVADNGAXIIEGMDADPCAEKRKEVLNAQYPKSAIYEDTKENYPLVNVE 
GSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVEGKTTVETKEL 
NQPLDVVVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVALVTYASTIFD 
GTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEVNILKSRIPKE 
AEHINGDRTLYQFGAT FTQKALMBCANEILETQSSNARKKLIFHVTDGVPTMSYAINFNPY 
ISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDRKVPVTGGTTQ 
AAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQIKTHGEPTTL 
YFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS I SSKTENYTNVDDTNKIYDELNK 
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YFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQLKNGVALGGPN 
SDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKFYNTNNRTTLS 
PKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSESLLGAKFQLQ 
IEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEVKTKPWTFTI 
QNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4117: SAG0649 FROM CJB110 GBS NONTYPEABLE STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGXATFENIKPG 
D YT LREET AP I G YKKT DKT WKVKVADNG AT HE GMD ADKAEKRKE VLN AQY PK S AI YE DT 

KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALPCAGEAVEKLIDKITSNKDNRVA 
L VT YAS TIF D GT E AT V S KG VAD QN GKALN D S VS W D YHKT T FT AT T HN Y S Y LN L TN D AN E V 

NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYXSTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
I KTHGE PTTL YFNGN I RPKG YDI FT VG I GVNGD PGAT PLEAEKFMQS I S SKTENYTNVD D 

TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPN S DGGI LKD VT VT YDKT SQT IKINHLNLGS GQKVVLT YDVRLKDN YI SNKF 

YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4118: SAG0649 FROM JM9130013 GBS TYPE VIII STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLPCNDNDKSETSHETVEGSGEATFENIKPG 
DYTLREETAPIGYKKTDKTWKVKVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDPCNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRI PKEAEHINGDRTL YQFGATFTQKALMKANE I LETQS SNARKKLI FHVT DGVPT 

MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIKKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLE 

SEQ ID NO. 4201: 2603 V/R STRAIN 

ATGGTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGGAATAAAGCTAACCTTTTC 
AC T GGAT G GG C T G AC G TAG AT C T T T C AG AAAAAG GT AC AC AAC AAG C TAT T GAT G CT G GG 

AAATTAATTCAAGCAGCAGGTATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGT 
GCCATCAAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAA 
AAAT CAT GG C G CT T G AAC G AAC G T CAT T AC G G T G GAT T G AC AG G AAAAAAT AAAG C AG AA 

GCAGCTGAACAATTTGGTGATGAGCAAGTTCATATTTGGCGTCGTTCATATGATGTATTG 
C CT C C AGAT AT G G C T AAAG AT GAT G AAC AT T C AG C AC AT AC T GAT C GT C G C T AT G CT T C A 
C TAG AT GAT T C T GT T AT T C C AG AT G C AGAAAAC CT AAAAGT T AC T T T AGAG CGT G C T CT T 

CCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGT 
G C AC AC G GT AAC T C AAT CCGTGCTCTT GT AAAAC AT AT C AAAC AAT T G T C AGAT GAT GAA 

ATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTTTTCGAATTTGATGAAAAATTA 
AACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4202: 090 STRAIN 

G T AAAAT T AGT AT T C G C AC G C C AC G GT G AAT C TG AGT G 

GAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTCAGAAA 
AAGGT AC AC AAC AAGCT AT T GAT G C T GG G AAAT T AAT T C AAG C AGC AGGT 

ATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAAC 
AACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAA 
AATCATGGCGCTTGAACGAACGTCATTACGGTGGATTGACAGGAAAAAAT 
AAAG C AG AAG C AG C T G AAC AAT T T GG T GAT G AG C AAG T T CAT AT T T G GC G 
T C GT T CAT AT GAT G T AT T GC C T C C AG AT AT G G C T AAAG AT GAT G AAC AT T 

CAGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCA 
GATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGA 
AGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTG 
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C AC AC G G T AACT C AAT CCGTGCTCT T GT AAAAC AT AT CAAAC AAT T GT C A 
GAT GAT G AAAT CAT G G ACGT T GAAAT T C CT AACT T CC C AC C ACT T GT T T T 
C GAAT T T GAT GAAAAAT T AAAC C T T GT T T C AG AAT AT T AC T T AGGT AAA 

SEQ ID NO. 4203: A909 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGG 

AATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTCAGAAAA 

AGGTACACAACAAGCTATTGATGCTGGGAAATTAATTCAAGCAGCAGGTA 

TTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAACA 

ACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAAA 

ATCATGGCGCTTAAACGAACGTCATTACGGTGGATTGACAGGAAAAAATA 

AAG C AG AAGC AG C T G AAC AAT T T G GT GAT G AGC AAG TT CAT AT T T GG C GT 

C GT T CAT AT G AT GT AT T G C C T CC AG AT AT GG CTAAAGAT GAT G AAC AT T C 

AGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCAG 

ATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGAA 

GATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTGC 

AC ACGGT AAC T C AAT CCGTGCTCTTG T AAAAC AT AT CAAAC AAT T GT C AG 

AT GAT GAAAT CAT G G ACGT T GAAAT T C C T AAC T T C C C AC C ACT T GT T T T C 

GAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4204: H36B STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAG 

TGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTCAGA 
AAAAGGT AC AC AAC AAG CT AT T GAT G C T GGG AAAT T AAT T C AAGC AG C AG 
GT AT T GAGT T CGAC CTTGCTTT T AC AT C AGT T CT T AAAC G T G C CAT C AAA 
ACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGA 
AAAATCATGGCGCTTGAACGAACGTCATTACGGTGGATTGACAGGAAAAA 
AT AAAGC AG AAG C AG C T GAAC AAT T T G G T GAT GAGC AAG T T CAT AT T T GG 
C GT C G T T CAT AT GAT G T AT T GC CT C CAGAT AT GG C T AAAG AT GAT G AAC A 
T T C AG C AC AT AC T GAT C GT C G C TAT G C T T C AC TAG AT GAT T C T G T TAT T C 
C AG AT GC AG AAAAC CT AAAAGT T AC T T TAG AG CGTGCTCTTCCTTTCTGG 
GAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGG 
T G C AC AC G GT AAC T C AAT C C G T GC T CT T GT AAAAC AT AT CAAAC AAT T GT 
CAGATGATGAAATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTT 

TTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAA 
A 

SEQ ID NO. 4205: 18RS21 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGG 

AAT AAAGC T AAC CT T T T C AC T GG AT GG GC T G AC GT AGAT C T T T C AG AAAA 

AGGTACACAACAAGCTATTGATGCTGGGAAATTAATTCAAGCAGCAGGTA 

fTTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAACA 

ACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAAA 

AT C AT GGCGCT T GAAC GAAC GT C AT T AC GGT G GAT T G AC AG GAAAAAAT A 

AAG C AGAAG C AG C T GAAC AAT T T GGT GAT GAG C AAGT T CAT AT T T G G C GT 

C GT T CAT AT GAT G TAT T G C C T C CAGAT AT G G C T AAAG ATG AT GAAC AT T C 

AGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCAG 

ATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGAA 

GATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTGC 

ACAC GG T AACT C AAT CCGTGCTCTT GT AAAAC AT AT CAAAC AAT T G T C AG 

ATG ATG AAAT C AT GG ACGT TGAAATTCCTAACTTCCCACC ACT TGTTTTC 
GAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4206: M732 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGG 

AAT AAAGC T AAC CT T T T C ACT G GAT G GG C T G ACGT AG AT C T T T C AGAAAA 

AGGT AC AC AAC AAG C T AT T GAT GC T GGG AAAT TAATT C AAG C AG C AG GT A 

TTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAACA 

ACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAAA 

ATCATGGCGCTTGAACGAACGTCATTACGGTGGATTGACAGGAAAAAATA 

AAGC AGAAGCAGCTG AAC AAT TTGGTGAT GAGC AAGT TC AT AT TTGGCGT 

C GT T CAT AT GAT GT AT T G C C T C CAGAT AT GG C T AAAG AT GAT GAAC AT T C 

AGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCAG 

ATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGAA 
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GATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTGC 
AC ACGGT AACT C AAT CCGTGCTCTT GT AAAAC AT AT C AAAC AAT T GT C AG 
ATGATGAAATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTTTTC 
GAAT TTGATGAAAAATT AAACCTTGT T T C AGAAT AT TACTTAGGT AAA 

SEQ ID NO. 4207: COH1 STRAIN 

GTAAAATTAGTATTCGCACGCCACGG 

T GAAT C T G AGT GGAAT AAAG CT AAC C T T T T C AC T GG AT GGG CT G AC GT AG 
ATCTTTCAGAAAAAGGTACACAACAAGCTATTGATGCTGGGAAATTAATT 
CAAGCAGCAGGTATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACG 
T GC C AT C AAAAC AACT AAC CTTGCCCTT G AAG C AGCT G AT C AACT T T G G G 
T AC C AGT T G AAAAAT CAT GG C GC T T GAACGAAC GT C AT T AC GGT G GAT T G 
AC AGG AAAAAAT AAAG C AGAAGC AG CT G AACAAT T T GG T GAT GAG C AAGT 
T CAT AT TTGGCGTCGTT CAT AT GAT GT AT T GC CT C C AG AT AT GG CT AAAG 
ATGAT GAACATT CAGCACAT ACT GAT CGT CGCT AT GCTT CACT AGAT GAT 
TCTGTTATTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCT 
TCCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATG 
TGTTTGTTGGTGCACACGGTAACTCAATCCGTGCTCTTGTAAAACATATC 
AAAC AAT T GT C AGAT GAT G AAAT CAT G GAC GT T G AAAT T C C T AAC T T C C C 

ACCACTTGTTTTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATT 
ACTTAGGTAAA 

SEQ ID NO. 4208: CJB110 STRAIN 

GTAAAATTAGTATTCGCACGCCACGG 

T GAAT CT G AGT GGAAT AAAGC T AAC C T T T T C AC T G GAT G GG CT GAC G TAG 
AT CT T T C AGAAAAAGGT AC AC AAC AAG CT AT T GAT G CT G G G AAAT T AAT T 
C AAG C AG C AGGT AT T GAG T T C GAC CTTGCTTT T AC AT C AGT T C T T AAAC G 
TGCC AT C AAAAC AACT AAC CTTGCCCTTGAAGCAGCTGATCAACTTT GGG 
T AC C AGT T GAAAAAT C AT GG C GC T T G AAC G AACGT CAT T AC GGT G GAT T G 
AC AGGAAAAAAT AAAG C AG AAG C AG C T GAAC AAT T T GGT GAT GAG C AAGT 
TCATATTTGGCGTCGTTCATATGATGTATTGCCTCCAGATATGGCTAAAG 
AT GAT GAAC AT T C AG C AC AT ACT GAT C G T C G C TAT GCTT C AC TAG AT GAT 
TCTGTTATTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCT 
TCCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGAT GGT AAAAAT G 
TGTTTGTTGGTGC AC ACGGT AACT C AAT CCGTGCTCTTGT AAAAC AT AT C 
AAAC AAT T GT C AG AT GAT GAAAT CAT GG AC GT T G AAAT T C CT AAC T T C C C 
AC C AC T T GT T T T CGAAT T T GAT GAAAAAT T AAAC C T T GT T T C AG AAT AT T 
ACTTAGGTAAA 

SEQ ID NO. 4209: 1169NT STRAIN 

AGTATTCGCACGCCACGGTGAATCTGAGTGGAATAAAGCTAACCTTTTCA 
C T GGAT G GGC T G AC GT AG AT C T T T C AG AAAAAGG T AC AC AAC AAG C TAT T 
GATGCTGGGAAATTAATTCAAGCAGCAGGTATTGAGTTCGACCTTGCTTT 
TAG AT C AGT T C T T AAAC GT GC C AT C AAAAC AAC TAACC T T G C C C T T G AAG 
CAGCTGATCAACTTTGGGTACCAGTTGAAAAATCATGGCGCTTGAACGAA 
C GT CAT T ACGGT G GAT T GAC AG G AAAAAAT AAAG C AGAAG C AG C T GAAC A 
ATTTGGTGATGAGCAAGTTCATATTTGGCGTCGTTCATATGATGTATTGC 
CT C C AG AT AT G G C T AAAG AT GAT GAAC AT T C AG C AC AT AC T GAT C GT C G C 
TAT GCTT CACT AG AT GAT T C T GT T AT T C C AGAT G C AG AAAAC C T AAAAGT 
TACTTTAGAGCGTGCTCTTCCTTTCTGGGAAGATAAAATTGCTCCTGCTC 
TTAAAGATGGTAAAAATGTGTTTGTTGGTGCACACGGTAACTCAATCCGT 
GCT CTT GT AAAACAT AT CAAAC AAT T GT CAGATG ATGAAAT C AT GGACGT 
T G AAATT C CT AACT TCCCACC ACT TGTTTTCG AAT TTGATGAAAAATTAA 
ACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4210: M781 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGT 

GAATCTGAGTGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGA 
T CT TT C AGAAAAAGGTAC AC AACAAGCT AT T GATGCTGGGAAAT T AAT T C 
AAGCAGCAGGTATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGT 
GCCATCAAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGT 
AC C AGT T GAAAAAT CAT G G C G C T T GAAC G AACGT C AT T AC G GT GG AT T G A 
C AG G AAAAAAT AAAG C AG AAG C AG C T GAAC AAT T T G GT GAT GAG C AAGT T 
CATATTTGGCGTCGTTCATATGATGTATTGCCTCCAGATATGGCTAAAGA 
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T GAT GAAC AT T C AGC AC AT ACT GAT C GT CG CT AT G C T T C AC TAG AT GAT T 
CTGTTATTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTT 
CCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGT 
GTTTGTTGGTGCACACGGTAACTCAATCCGTGCTCTTGTAAAACATATCA 
, AAC AAT T GT C AG AT GAT G AAAT CAT GG ACGT TGAAAT T C CT AACT T C C C A 

CCACTTGTTTTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTA 
C T T AG GT AAA 

SEQ ID NO. 4211: •3M930013 STRAIN 

G T AAAAT T AGT AT T C G C AC GC C ACG G T G AAT CT 

GAGTGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTC 
AG AAAAAG GT AC AC AAC AAG C TAT T GAT GC T G GGAAAT T AAT T C AAG C AG 
C AGGT AT T G AGT T CG AC CT T G C T T T T AC AT C AGT T C T T AAAC GT G C CAT C 
AAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGT 
T G AAAAAT CAT G GC G C T T GAAC GAAC GT CAT T AC G GT GG AT T G AC AGG AA 
, AAAAT AAAG C AG AAG C AG C T GAAC AAT T T G GT G AT GAG C AAGT T CAT AT T 
TGGCGTCGTT CAT AT G AT GT AT T G C CT C C AG AT AT G G CT AAAG AT GAT GA 
AC AT T C AG C AC AT AC T GAT C G T CG C TAT G CT T C AC TAG AT GAT T CT GT T A 
TTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTC 
TGGG AAG AT AAAATTGCTCCTGCTCTT AAAG ATGGT AAAAAT GTGTTTGT 
T GGT G C AC AC GGT AAC T C AAT CCGTGCTCT T GT AAAAC AT AT C AAAC AAT 

T GT C AG AT G ATGAAAT C AT GG ACGT TGAAAT TCCTAACTTCCC AC C AC TT 

GTTTTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGG 
TAAA 

SEQ ID NO. 4212: 2603 V/R STRAIN 

VKL VFARHGE SE WNKANL FTGWADVDL SEKGTQQAI DAGKL I QAAG IE FDLAFT S VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4213: 090 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4214: A909 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIE FDLAFT S VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGN S I RAL VKH I KQL S D DE I M D VE I PN F P P L V FE FDE K LN L V S E Y YL G K 

SEQ ID NO. 4215: H36B STRAIN 

VKL V FARHGE SE WNKANL FTGWADVDL SEKGTQQAI DAGKL I QAAG IE FDLAFT S VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNS IRALVKHIKQLSDDE IMDVE I PNFPPLVFEFDEKLNLVSE YYLGK 

SEQ ID NO. 4216: 18RS21 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIE FDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNS I RAL VKH I KQL S DDEIMDVE I PNFP PLVFEFDEKLNLVSE YYLGK 

SEQ ID NO. 4217: M732 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNS IRALVKHIKQLSDDEIMDVEI PNFPPLVFEFDEKLNLVSE YYLGK 

SEQ ID NO. 4218: COHl STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKL IQAAGIEFDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
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PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGN S I RAL VKH I KQL S D DE I MD VE I PN F P P L V FE F DE KLN L V S E Y Y L GK 

SEQ ID NO. 4219: CJB110 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIEFDLAFTSVLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4220: 1169NT STRAIN 

VFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKRAIKT 
TNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLPPDM 
AKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGAHGN 
SIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4221: M781 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIEFDLAFTSVLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4222: JM9130013 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWE DKIAPALKDGKNVFVGA 
HGNS IRALVKHIKQLS DDE IMDVE I PNFPPLVFE FDEKLNLVSE Y YLGK 
SEQ ID NO. 4301: 2603 V/R STRAIN 

ATGAATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATC 
GTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCT 
AATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCT 
GAT G AAGT AAC AAAC GGGAT T GT AAAAGAG C G C T TAG C T G AGG AT GAT AT C G C AGAAAAA 
GGTTTTTTACTT GAT GG AT AT C C AC GT ACT AT T G AAC AAG C AC AC G C C T T AGAT G CT ACG 
C T T G AAG AAC T AGG AC T AC G CT T AG AT G GT G T T AT T AAT AT T AAAG T G GAT C CAT CAT GT 
C T TAT AG AG C GT T T GAG T G kT C GT AT TAT C AAT CGT AAAAC T G GT G AAAC T T T C C AC AAA 
GTGTTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAG 
CCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAA 
C AC TAT C GT AAG C T TGGT C TT GT T AC AG AT AT T G AAGG T AAT C AAG AAAT AAC AGAAG T T 
TTTGCAGATGTTGAAAAAGCGTTGCTAGAACTCAAA 

SEQ ID NO. 4302: 090 STRAIN (reverse complement) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCA 

AGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCG 
CGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGG 
T G AAT TGGTTCCT GAT G AAGT AAC AAAC GG GAT T G T AAAAGAG C G C T T AGC T G AGG AT G A 

TATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGC 
CT T AGAT G C T AC GC T T GAAG AAC T AG G AC T AC G C T T AGAT GGT G T T AT T AAT AT T AAAG T 
G GAT C CAT C AT GT C T TAT AG AG C GT T T G AGT G G T C GT AT TAT C AAT C GT AAAAC T GGT G A 
AAC T T T C C AC AAAG T G T T C AAC C C AC C AG TAG AT TAT AAAG AAG AAG AT T AC TAT C AAC G 
T GAAG AT GAT AAG C C T G AAAC T GT C AAACGT C G C T T GGAC GT T AAT AT T G C T C AAGG AG A 
AC C T AT T C T T G AAC AC TAT CGT AAG CTTGGTCTTGT T AC AG AT AT T G AAGGT AAT C AAGA 
AAT AAC AGAAGT T T T T G C AGAT GT T G AAAAAG CG T T G 

SEQ ID NO. 4303: 1169NT STRAIN (REVERSE COMPLEMENT) 

TGGTAAAGGGACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCGCACATCTCAAC 
AGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAG 
T TAT AT T GAT AAAG GT G AAT TGGTTCCT GAT C AAGT AAC AAAC G G GAT T GT AAAAGAG C G 
C T TAG C T G AGG AT GAT AT C G C AG AAAAAGGT T T TT T AC T T G AT GGG TAT C C AC GT AC TAT 
TGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGT 
TATTAATATTAAAGTGGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAA 
TCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAGATTATAAAGAAGA 
AGAT T AC TAT C AACGT GAAG AT GAT AAG C CT G AAAC T GT C AAAC G T C G C T T GGAC G T T C A 
TAT T G C T C AAGG AG AAC CT AT T C T T G AAC AC TAT AG T AAG CTTGGCCTTGT T AC AG AT AT 
T GAAG GT AAT C AAG AAAT AA 
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SEQ ID NO. 4304: 18RS21 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTCGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCG 

TTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTA 

ATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTG 

ATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAG 
G T T T T T T AC T T GAT GG AT AT C C ACGT ACT AT T G AAC AAG C AC AC G C CT TAG AT G C T AC G C 
T T GAAG AAC T AGG AC T AC G C T T AGAT GGT GT T AT T AAT AT T AAAGT GG AT C CAT CAT GT C 
T TAT AG AG C GT T T G AGT G GT C GT AT TAT C AAT C G T AAAACT GG T G AAACT T T C C AC AAAG 
TGTT CAACC C AC C AGT AGAT T ATAAAGAAGAAGATT ACT ATC AACGTGAAGAT GAT AAG C 
CTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAAC 

ACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTT 
TTGCAGATGTTGAAAAAGCGTTG 

SEQ ID NO. 4305: A909 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAG 

CTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCG 
C AAT G G C T AAT C AAAC CG AAAT GG GACGT T T AG C T AAAAGT T AT AT T G AT AAAGGT GAAT 

TGGTTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCG 
CAGAAAAAGGTTTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAG 
ATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATC 
CATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTT 
TCCACAAAGTGTTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAG 
ATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAATCTA 
T T C T T G AAC AC TAT C G AAAG C TTGGTCTT GT T AC AG AT AT T G AAGG T AA 

SEQ ID NO. 4306: CJB110 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTTGCTTGGTGCTGGTAAAGGTACTCAAGCAGCTAA 
GATCGTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAAT 

GGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGT 
T C CT GAT G AAG T AAC AAACG GG AT T GT AAAAG AG C G CT TAG CT G AGG AT GAT AT C G C AG A 
AAAAGG T T T T T TACT T GAT G GAT AT C C AC GT ACT AT T G AAC AAG C AC AC G C CT TAG AT G C 
T AC G C T T GAAG AAC TAG G AC T ACG C T TAG AT GGT GT TAT T AAT AT T AAAGT GG AT C CAT C 

ATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCA 
C AAAG TGTT C AAC C C AC C AGT AG AT T AT AAAG AAGAAG AT T AC T AT C AAC GT GAAG AT G A 
T AAG C C T G AAAC T GT C AAACG T CG C T T GG AC G T T AAT AT T GC T C AAG GAG AAC CT AT T C T 
TGAACACTATAG 

SEQ ID NO. 4307: COH1 STRAIN (REVERSE COMPLEMENT) 

ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTG 
AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAATC 
AAAC C C AAAT G G GAC GT T TAG C T AAAAGT TAT AT T G AT AAAGG T GAAT T GG T T C CT G AT G 
AAGT AAC AAACG GG AT T GT AAAAGAG C G CT T AG CT G AGG AT GAT AT C G C AG AAAAAGGT T 
TTTTACTT GAT G GAT AT C C AC GT AC T AT T G AG C AAG C AC AC G C C T T AGAT G C T ACGC T T G 
AAG AAC T AGG AC T AC G C T TAG AT GG T GT T AT T AAT AT T AAAGT G GAT C C AAC AT GC CT T A 

T AG AGCGTTTG AGT GGCCGT ATT ATC AAT CGT AAAACT GGT GAAACTTTCC AC AAAG TGT 
T C AAC C C AC C AGT AGAT TAT AAAG AAG AAG AT TACT AT C AACG T GAAG AT GAT AAG C CT G 
AAAC TGT C AAAC GT C G C T T G G ACGT T AAT AT T G CT C AAGGAG AAC C T AT T CT T G AAC AC T 
AT C GT AAG CTTGGTCTTGT T AC AGAT AT T G AAGGT AAT C AAG AAAT AAC AG AAG T T T T T G 
C AG AT GT T G AAAAAGCG T T G 

SEQ ID NO. 4308: H36B STRAIN (REVERSE COMPLEMENT) 

CAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAA 
GT T AT AT T GAT AAAGGT GAAT T GGT T C CT GAT G AAGT AAC AAAC G G GAT T GT AAAAG AG C 
G C T T AG C T GAG GAT GAT AT C GC AG AAAAAG GT T T T T TACT T GAT G GAT AT C C ACGT AC T A 

TTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTG 
T T AT T AAT AT T AAAGT G GAT C CAT C AT GT CT TAT AG AG C G T T T G AGT GGT CGT AT TAT C A 
AT C GT AAAAC T G GT G AAAC T T T C C AC AAAGT GT T C AAC C C AC C AGT AG AT TAT AAAG AAG 

AAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTA 
AT AT T G C T C AAG GAG AAT CT AT T C T T G AAC AC TAT C GT AAG CTTGGTCTTGT T AC AG AT A 

TTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGATGTTGAAAAAGCGTTG 

SEQ ID NO. 4309: JM9130013 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGT 

ACT C AAG C AG CT AAG AT C G T T GAAG AAT TTGGTGTTGCT C AC AT C T C AAC AG G GG AT AT G 
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TTCCGCGCCG C AAT G GC T AAT C AAACCG AAAT GGGAC GT T T AGC T AAAAGT TAT AT T GAT 

AAAGGTGAATTGGTTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAG 
GAT GAT AT CG CAG AAAAAGGT T T T T T ACT T GAT GGAT AT C C AC GT AC T AT T GAAC AAG C A 

CACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATT 
AAAGT GGAT CC AT C AT GT CT T AT AGAG CGT T T G AGT GGT CGT AT TAT C AAT C GT AAAAC T 
G GT G AAACT T T C C ACAAAGT G T T C AAC C C AC C AGT AG AT T AT AAAG AAG AAGAT TAG TAT 
CAACGTGAAGATGATAAGCCTGAAACTGTTAAACGTCGCTTGGACGTTAATATTGCTCAA 

GGAGAACCTATTCTTGAACACTATAAAAAGCTTGGTCTTGTTACAGATATTGAAGGTAAT 
CA 

SEQ ID NO. 4310: M732 STRAIN (REVERSE COMPLEMENT) 

CTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAA 
G AAT TTGGTGTTGCT C AC AT C T CAAC AGGG GAT AT GT TCCGCGCCG C AAT GG C T AAT C AA 

ACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGATGAA 
GT AAC AAAC GGG AT T GT AAAAGAG C G C T T AG CT G AGGAT GAT AT C G C AG AAAAAG G T T T T 
T T ACT T GAT G GAT AT C C AC GT AC TAT T GAG C AAG C AC AC G C C T TAG AT G C T AC GC T T G AA 
GAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCAACATGCCTTATA 
GAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTC 
AACCCAC CAGTAGATTAT AAAGAAGAAGAT T AC TAT CAACGT GAAGATGAT AAGC CTGAA 
ACT GT C AAACGT C G C T T GG AC GT T AAT AT T GC T C AAG GAG AAC C TAT T C T T GAAC AC TAT 
C G T AAG CT T G GT C T T GT TAG AG AT AT T G AAGGT AAT C AAG AAAT AAC AGAAGT T T T T G C A 
GATGTTGAAAAAGCGTTG 

SEQ ID NO. 4311: M781 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTACGGGTTTGCCTGGTGCTGGTAAAGGTACTCAA 

G C AGC T AAG AT T GT T G AAG AAT TTGGTGTTGCT C AC AT CT CAAC AG G GG AT AT GTT C CGC 
G C C G C AAT G G C T AAT C AAAC C C AAAT G GG AC GT T T AGC T AAAAGT TAT AT T G AT AAAGGT 
GAATTGGTTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGAT 
AT C G C AG AAAAAGGTT T T T T AC T T GAT GGAT AT C C AC G T AC T AT T GAG C AAG C AC ACG C C 
T TAG AT G C T AC G C T T G AAGAAC TAG G AC T AC G CT TAG AT GG T GT T AT T AAT AT T AAAGT G 
GAT C CAAC AT G C CT TAT AG AGC G T T T GAG T G GC C GT AT TAT C AAT C G T AAAAC T GGT GAA 
AC T T T C C AC AAAGT GT T CAAC C C AC CAG TAG AT TAT AAAG AAG AAG AT T AC TAT CAACGT 

gaag at gat aag c ct g aaact gt c aaac gt cg ct t gg ac gt t aat at t g c t c aa 
seq id no. 4312: 2603 v/r strain 

™llimglpgagkgtqaakiveefgvahistgdmfraamanqtemgrlaksyidkgelvp 

DEVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSC 
LIERLSXRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILE 
HYRKLGLVTDIEGNQEITEVFADVEKALLELK 

SEQ ID NO. 4313: 090 STRAIN 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVT DI EGNQE I TE VFADVEKALLE LK 

SEQ ID NO. 4314: 1169NT STRAIN 

GKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPDQVTNGIVKER 

LAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIIN 

RKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVHIAQGEPILEHYSKLGLVTDI 
EGNQE I 

SEQ ID NO. 4315: 18RS21 STRAIN 

NLLTTGSPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERL SGRI INRKTGET FHKVFNPPVDYKEED YYQRE DDKPETVKRRLDVNIAQGE PI LEH 
YRKLGLVT D I EGNQE X TE VFADVEKALLE 

SEQ ID NO. 4316: A909 STRAIN 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRI INRKTGET FHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGES I LEH 
YRKLGLVTDIEG 
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SEQ ID NO. 4317: A909 STRAIN 

N L L I MGL P G AGKG T QAAK I VE E FG VAH 1ST G DM FRAAMAN QT EMGR LAK S Y I DKGE L V P D 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVTDIEG 

SEQ ID NO. 4318: CJB110 STRAIN 

NLLTTGLLGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
Y 

SEQ ID NO. 4319: COHl STRAIN 

LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPDE 

VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 

ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVT D IEGNQE ITEVFADVEKALL 

SEQ ID NO. 4320: H36B STRAIN 

GDMFRAAMANQTEMGRLAKSYIDKGELVPDEVTNGIVKERLAEDDIAEKGFLLDGYPRTI 
EQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIINRKTGETFHKVFNPPVDYKEE 
D YYQRE DDKPET VKRRLDVN I AQGE S I LEHYRKLGLVT DIEGNQE I TEVFADVEKAL 

SEQ ID NO. 4321: JM9130013 STRAIN 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YKKLGLVTDIEGN 

i 

SEQ ID NO. 4322: M732 STRAIN 

LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPDE 

VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 

ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVT DIEGNQE I TEVFADVEKALLELK 

SEQ ID NO. 4323: M781 STRAIN 

NLLITGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCL 
I ERL S GR 1 1 NRKT GET FHBCV FN PPVDYKEED YYQRE DDKPET VKRRLDVN I AQ 

SEQ ID NO. 4401 
STRAIN 2603 

G T GG AT AAAC AT C ACT C AAAAAAGGC T AT T T T AAAGT T AAC A 

CT T AT AAC AACT AG T AT T T T AT T AAT G CAT AG C AAT C AAGT G AAT G C AG AGG AG C AAGAA 
T T AAAAAAC C AAGAG C AAT C AC CT GT AAT T G C T AAT G T T GC T C AAC AG C CAT C GC CAT C G 

GTAACTACTAATACTGTTGAAAAAACATCTGTAACAGCTGCTTCTGCTAGTAATACAGCG 
AAAG AAAT G G GT GAT AC AT C T GT AAAAAAT GAC AAAAC AGAAG AT G AAT TAT TAG AAGAG 

TTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGAAGAAGAATATCCCTCT 
AAAC C AG AG AC AAC C AAC AAT AAAG AAAG C AAT GTAGT AAC AAAT G CT T C AAC T G CAAT A 
G C AC AG AAAGT T CC C T C AGC AT AT GAAG AG GT G AAG C C AG AAAG C AAGT CAT CGCTTGCT 
G T T CT T GAT AC AT C T AAAAT AAC AAAAT T AC AAG C CAT AAC C C AAAGAG G AAAG G G AAAT 
GT AGT AG C TAT TAT T GAT AC T G G C T T T GAT AT T AAC CAT GAT AT T T T T C G T T TAG AT AG C 
C C AAAAG AT GAT AAG C AC AG C T T T AAAAC T AAG AC AG AAT T T GAG GAAT T AAAAG C AAAA 
CAT AAT AT C AC T TAT GGG AAAT GGGT T AACG AT AAGATT G T T T T T G C AC AT AAC T AC G C C 
AACAATACAGAAACGGTGGCTGATATTGCAGCAGCTATGAAAGATGGTTATGGTTCAGAA 
GCAAAGAATATTTCGCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGT 
CCAGCAATCAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAATG 
CG T AT T C CAGAT AAAAT T GAT T C GG AC AAAT T T G G T GAAG CAT AT G C T AAAG CAAT C AC A 
G AC GC T GT T AAT C T AG G AG C AAAAAC GAT T AAT AT GAGT AT T G G AAAAAC AG C T GAT T C T 
T T AAT T G C T CT CAAT GAT AAAGT T AAAT TAG C AC T T AAAT TAG C T T CT G AG AAG GG C GT T 
GCAGTTGTTGTGGCTGCCGGAAATGAAGGCGCATTTGGTATGGATTATAGCAAACCATTA 
TCAACTAATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACTTTGAGT 
GTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGAAACAACTATTGAAGGT 
AAGT T AGT T AAGT T G C C GAT T GT G AC T T CT AAAC C T T T T G AC AAAGGT AAGGC C T ACG AT 



157 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



GT GGT T T AT GC C AAT TAT G GT GC AAAAAAAG AC T T T G AAGGT AAG G AC T T T AAAGGT AAG 
ATTGCATTAATTGAGCGTGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACA 
AATGCAGGTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTCTA 
AT T CCT T AC C GT G AAT t AC CT GT GG GGAT TAT T AGT AAAGT AG AT G G C G AGC GT AT AAAA 
AATACTTCAAGTCAGTTAACATTTAACCAGAGTTTTGAAGTAGTTGATAGCCAAGGTGGT 
AAT CGT AT GCT GG AAC AAT C AAGTTGGGGCGT G AC AG CT G AAGGAG C AAT CAAGC CT G AT 
GT AAC AG CT TCTGGCTTT G AAAT T T AT T CT T C AACC T AT AAT AAT C AAT AC C AAAC AAT G 
TCTGGTACAAGTATGGCTTCACCACATGTTGCAGGATTAATGACAATGCTTCAAAGTCAT 
TTGGCTGAGAAAT AT AAAGGGATGAAT TT AGATT CTAAAAAATTGCTAGAAT TGT CT AAA 
AAC AT CCT CAT G AGC T C AG C AAC AG CAT T AT AT AGT G AAG AGGAT AAGG C GT T T TAT T C A 
C C AC GT C AGC AAG G T G C AGGT G T AGT T GAT GCT G AAAAAG C T AT C C AAG C T C AAT AT TAT 
AT T AC T GG AAAC GAT GGC AAAG C T AAAAT T AAT C T CAAAC G AAT GGGAG AT AAAT T T GAT 
AT C AC AGT T AC AAT T CAT AAAC T T GT AG AAG GT GT C AAAG AAT T G TAT TAT C AAG C T AAT 
G T AGC AAC AGAAC AAG T AAAT AAAGG T AAAT TT G C C C T T AAAC C AC AAG C C T T G C T AGAT 
ACT AAT T GG C AG AAAGT AAT T CT T CG TG AT AAAG AAAC AC AAGT T C GAT T TACT AT T GAT 
G C T AG T C AAT T T AGT C AG AAAT T AAAAG AAC AG AT GG C AAAT G G T T AT T T C T TAG AAGGT 
T T T GT ACG T T T T AAAG AAG C C AAGG AT AGT AAT C AG GAGT T AAT G AGT AT T C CT T T T GT A 
GGAT T T AAT GGT GAT T TT G CG AAC T T AC AAG C ACT T G AAAC AC C GAT T TAT AAG AC G CT T 
T CT AAAG GT AGT T T C T AC TAT AAAC C AAAT G AT AC AAC T C AT AAAGAC C AAT T G GAGT AC 
AAT G AAT C AG CTCCTTTT G AAAG C AAC AAC TAT AC TGCCTTGT T AAC AC AAT C AG C GT CT 

TGGGGCTATGTTGATTATGTCAAAAATGGTGGGGAGTTAGAATTAGCACCGGAGAGTCCA 
AAAAG AAT TAT T T T AGG AACT T T T G AGAAT AAGGTT G AG G AT AAAAC AAT T CAT C T T T T G 
G AAAG AGAT G C AG CG AAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAGAT G G AAAT AG G 
GACGAAATCACTCCCCAGGCAACTTTCTTAAGAAATGTTAAGGATATTTCTGCTCAAGTT 
CTAGATCAAAATGGAAATGTTATTTGGCAA&GTAAGGTTTTACCATCTTATCGTAAAAAT 
TTCCATAATAATCCAAAGCAAAGTGATGGTCATTATCGTATGGATGCTCTTCAGTGGAGT 
G G T T TAG AT AAG GAT G G C AAAGT T GT AG C AG AT GGT T T T TAT AC T TAT C G C T T AC GT T AC 

ACACCAGTAGCAGAAGGAGCAAATAGTCAGGAGTCAGACTTTAAAGTACAAGTAAGTACT 
AAG T C AC C AAAT C T T C C T T C ACG AG C T C AGT T T GAT G AAAC T AAT C G AAC AT T AAG C T T A 

GCCATGCCTAAGGAAAGTAGTTATGTTCCTACATATCGTTTACAATTAGTTTTATCTCAT 
GT T G T AAAAGAT G AAG AAT AT GG GG AT GAG AC T T CT TAG CAT TAT T T C CAT AT AGAT C AA 

G AAGGT AAAGT GACACTTCCTAAAACGGTTAAGATAGGAG AG AGTG AGGT TGCGGTAGAC 
CCTAAGGCCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAACGGTAAAATTG 
T C T GAT C T CT T G AAT AAG G C AG T AGT AT C AG AG AAAG AAAAC G CT AT AGT AAT T T C T AAC 
AGT T T C AAAT AT T T T GAT AAC T T G AAAAAAG AAC C T AT GT T TAT T T C T AAAAAAGAAAAA 
GT AGT AAAC AAGAAT CT AG AAGAAAT AAT AT T AGT T AAG C C G C AAAC T AC AGT TACT AC T 
C AAT CAT TGT C T AAAG AAAT AACT AAAT C AGG AAAT GAG AAAGT CCT C AC T T C T AC AAAC 
AAT AAT AG TAG C AG AGT AG C T AAG AT CAT AT C AC C T AAAC AT AAC G GGGAT T CT GT T AAC 
CAT AC C T T AC C T AGT ACAT C AG AT AGAGC AAC G AAT GGT C T AT TTGTTGGTACTTTGG C A 
T T G T TAT CT AGT T T AC T T C T T TAT T T G AAAC C C AAAAAGAC T AAAAAT AAT AGT AAA 

SEQ ID NO. 4402 
STRAIN 090 

G AGG AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T GT AAT T G CT 
AATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATATTGTTGAAAA 
AACATCTGTAACAGCTGCTTCTGCTAGTAATACAGTGAAAGAAATGGGTG 
AT AC AT CT GT AAAAAAT G AC AAAAC AG AAG AT G AAT TAT T AGAAG AGT T A 
T C T AAAAAC C T T GAT ACG T CT AAT TTGGGGGCT GAT C T T G AAG AAGAAT A 
T C C CT CT AAAC C AG AG AC AAC C AAC AAT AAAG AAAG C AAT GT AGT AAC AA 
ATGCTTCAACTGCAATAGCACAGAAAGTTCCCTCAGCGTATGAAGAGGTG 
AAGCCAGAAAGCAAGTCATCGCTTGCTGTTTTTGATACATCTAAAATAAC 
AAAAT T G C AAG C CAT AAC C C AAAGAG G AAAGG G AAAT GT AG T AG CT AT T A 
T T GAT AC T G G C T T T GAT AT T AAC CAT GAT AT TTTTCGTT TAG AT AG C C C A 
AAAGAT GAT AAG C AC AG C T T T AAAACT AAAGC AG AAT T C G AGGAAT T AAA 
AGC AAAAC AT AAT AT C AC T TAT G GG AAAT G G G T T AAC GAT AAG AT T GT T T 
TTfeCACATAACTACGCCAACAATACAGAAACGGTGGCTGATATTGCAGCA 
GCTATGAAAGATGGTTATGGGTCAGAAGCAAAGAATATTTCGCATGGTAC 
AC AC GT T G C T GGT AT T T T T G T AG GT AAT AG T AAACG T C C AG C AAT C AAT G 
GTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAATGCGT 
AT T C C AGAT AAAAT T GAT T C GG AC AAAT T T GG AG AAG CAT AT G C T AAAG C 
AAT C AC AGAC G C T G t T AAT C T AG G AG C AAAAa C GAT T AAT AT GAG C C T T G 
G AAAAAC AG C AG AT T C T T T AAt t G C a C T C AAT GAT AAAG T T AAAT TAG C A 
CTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCCGGAAA 
TGAAGGTGCATTTGGTATGGAT TAT AGC AAAC CATT AT C AACT AAT cCTG 
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ACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACTtTGAGTGTT 
GCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGAAACAACTAT 
TGaaGGT AAGT T AGT TAAGTTGCCGATTGTGACTTCTAAACCTTTtGACA 
AAGGT AAG G C C T ACG AT GTG G T T T AT GC C AAT T AT GG T G C Aa AAAAAG AC 
TTTGAAGGTAAgGACTTTAAAGGTAAGATTGCATTAATtGAGCGTGGtGG 
TGGACTTGATTTTATGACTAAaatCACTcATGCTACAAATGCAgGTGTTG 
tTGGTaTCGTtATTtttAACgAtCAAGAaaAACGtGGAAATTTTcTAATT 
CCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGATGGCGAGCG 
T AT AAAAAAT ACTT CAAGT CAGTTAACATTT AAC C AGAGTTTT gAAGT AG 
TTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGGGGCGTG 
AC AG CT G AAG G AG C AAT C AAG CC T GAT G T AAC AG CT T CTGGCTTT G AAAT 
TTATT CTT CAACCT AT AAT AAT CAAT ACCAAACAATGT CTGGTACAAGTA 
T GGCT T C AC C AC AT GT T GC AG GAT T AAT G AC AAT G C T T C AAAGT CAT T T G 
GCT GAGAAAT AT AAAGGGAT GAATTTAgAT T CT AAAAAAT TGCTAGAAT T 
GT CT Aa AAAC AT C C T CAT GAG C T C AG C a a C AG C AT T AT AT AGT g AAG AgG 
ATAAGGCGTtTtATTCaCCACGTCAGCAAGGtGCAGGtGTAGTTGATGCT 
GAAAAAGCT AT CCAAGCT C AAT AT T AT GT TACT G G AAAC GAT GGCAAAGC 
TAAAAT T AATCTCAAACGAGT GGGAGAT AAATTT GAT AT C AC AGTT ACAA 
TTCATAAACTTGTAGAAGGTGTCAAAGAATTGTATTATCAAGCTAATGTA 
GCAACAGAACaAGTAAATAAAGGTAAATTTGCCCTTAAACCACAAGCCtT 
G CT AG AT ACT AAT T GG CAGAa AGT AAT T C T T c GT GAT AAAGAAAC AC AAG 
T T c GAT T T AC TAT T GAT GCT AGT CAAT T T AG T C AGAAAT T AAAAG AAC AG 
ATGGCAAATGGTTATTTCTTAgAAGGTTTTGTACGTTTTAAAGAAGCCAA 
G GAT AG t AAT C AG G AGT T Aa T GAG TAT T C CT T t T G T AGG AT 1 1 AAT GGT G 
ATTTTGCGAACTTACAAGCACTTGAAACACCGATTTATAAGACGCTTTCT 
AAAGGTAGTT T CT ACT AT AAACCAAAT GAT ACAACT CATAAAGACC AAT T 
GG AGT AC AAT G AAT C AG CT C CT T T T G AAAG C AAC AAC TAT ACT G C C T T GT 
TAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAATGGTGGG 
G AGT T AGAAT T AG C AC C G G AgAG T c C AAAAAG AAT TAT T T T Ag G AACT T T 
TGAGAAT AAGGT TGAGGATAAAACAAT TCAT CTTTT G G AAAG AG AT GC AG 
C g AAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAG AT GGAAAT AGG GAT 
GAAATCACTCCCCAGGCAACTTT CTT AAG AAAT GTTAAGGATATTTCTGC 
TCAAGTTCTAGATCAAAATGGAAATGTTATTTGGCAAAGTAAGGTTTTAC 
CAT C T TAT C GT AAAAAT T T C CAT AAT AAT CC AAAG C AAAG T G AT GG T CAT 
TATCGTATGGATGCCTTTCAGTGGAGTGGTTTAGATAAGGATGGCAAAGT 
T G T AG C AG AT GGT T T T TAT ACT TAT CG CC T AC G T T AC AC AC C AG TAG C AG 
AAG G AGC AAAT AG T C AGGAGT C AGACT T T AAAGT T C AAGT AAGT ACT AAG 
T C AC C AAAT C T T C C T T TACT AG C T C AG T T T GAT GAAAC T AAT C G AAC AT T 
AAG CT TAG C C AT G C C T AAGGAAAGT AG T T AT GT T C CT AC AT AT C GT T T AC 
AAT T AGT T T TAT C T CAT GT T GT AAAAGAT GAAG AAT AT G GGG AT GAG AC T 
TCTTACCATTATTTCCATATAGATCAAGAAGGTAAAGTGACACTTCCTAA 
AACGGTTAAGATAGGAGAGAGTGAGGTTGCAGTAGACCCTAAGGCCTTGA 
CACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTAAAATTGTCT 
GACCTCTTGAATAAGGCAGTAGTATCAGAGAAAGAAAACGCTATAGTAAT 
TT CT AAC AGTTT CAAATAT TTTGAT AACTTGAAAAAAGAATCT AT GTTT A 
TTTCTAAAGAAGGAAAAGTAGTAAACAAGAATCTAGAAGAAATAACATTA 
GTTAAGCCGCAAACTACAGTT ACT ACT CAAT CAT T GT CTAAAGAAAT AAC 
T AAAT C AG G AAAT GAG AAAGT C C T C AC T T C T AC AAAC AAT AAT AGT AG C A 
G AGT AG CT AAG AT CAT AT C AC C T AAAC AT AAC G G GG AT T CT GT T AAC CAT 
ACC 

SEQ ID NO. 4403 
STRAIN A909 

GAG GAG C AAG AAT T AAAAAAC C AAG AG CAAT 

CACCTGTAATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACT 
AAT ACT GT T G AAAAAAC AT CT GT AAC AT CT GCTT CTGCTAGT AAT AC AGC 
G AAAG AAAT G GGT GAT AC AT C T GT AAAAAAT G AC AAAAC AG AAG AT G AAT 
TATTAGAAGAGTTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGAT 
CTTGAAGAAGAAT AT CC CT CT AAACC AG AGAC AACCAAC AAT AAAGAAAG 
CAAT GT AGT AAC AAATGCT TC AACT GCAAT AGC ACAGAAAGTT C CCT C AG 
CAT AT G AAGAGGT G AAG C C AG AAAG C AAG T CAT C ACT TGCTGTTCTT GAT 
AC AT CT AAAAT AACAAAAT TGC AAGC CAT AAC C C AAAG AGG AAAG GG AAA 
T G TAG TAG C TAT TAT T G AT ACT GG C T T T GAT AT T AAC CAT GAT AT T T T T C 
GTTTAGATAGCCCAAAAGATgaTAAGCACAGCTTTAaAACTAAGGCAGAA 
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TTTGAGGAATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAA 
C G AT AAG AT TGtTTTTG C AC AT AACT ACGCC Aa C AAT AC AG AAAC G GT G G 
CTGATATTGCAGCAGCTATGAAAGATGGTTATGGGTCAGAAGCAAAGAAT 
ATTTCGCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACG 
TCCAGCAATCAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAG 
TCTTATTAATGCGTATTCCAGATAAAATTGATTCGGACAAATTTGGTGAA 
GC AT AT GC T AAAG C AAT C AC AG AC G C T GT T AAT CT AG G AGC AAAAAC GAT 
T AAT AT G AGC C T T GGAAAAAC AG C AGAT T CTT T AAT T GC T CT C AAT GAT A 
AAGTTAAATTAGCACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTT 
GT GGC T GC C GGAAAT G AAG G T G CAT T T GGT AT G GAT TAT AG C AAAC CAT T 

ATCAACTAATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAG 
ATACTTTGAGTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTC 
GTTGAAACAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTC 
TAAACCTTtTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATG 
G T G C AAAAAAAAGACT T T G AAGGT AAG G AC T T T AAAGG T AAG AT T G C AT T 

AATTGAGCGTGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTA 
C AAAT G C AGG T GTTGTTGG T AT CGT T AT T T T T AACG AT C AAG AAAAAC GT 

GGAAATTTTCTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAA 
AGT AG AT GG C G AGCGT AT AAAAAAT AC T T C AAGT C AG T T AAC AT T T AAC C 
AGAGTTTTGAAGTAGTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAA 
T C AAGT T GGGG CGT G AC AG CT G AAG GAG C AAT C AAGC C TGAT GT AAC AG C 

TTCTGGCTTTGAAATTTATTCTTCAACCTATAATAATCAATACCAAACAA 
T GT C T G GT AC AAGT AT GGC T T C AC C AC AT G t T GC AGG AT T AAT G AC AAT G 
CTT C AAAGT C AT T T G G CT GAG a AAT AT AAAGG GAT G AAT T TAG AT T C T AA 
AAAAT T G CT AG a AT T G T CT AAAAAC AT c CT CAT GAG C T C AGC AAC AG CAT 
T AT AT AGT GAAG AGGAT AAG G C GT T T T AT T C AC C AC GT C AG C AAG GT GC A 
GGTGTAGTTGATGCTGAAAAAGCTATCCAAGCTCAATATTATGTTACTGG 
AAAC GAT GG C AAAG C T AAAAT T AAT CT C AAACG AGT G G GAG AT AAAT T T G 
AT AT C AC AG T T AC AAT T CAT AAAC T T G TAG AAG GT G T C AAAG AAT T G TAT 
TAT CAAGCTAATGT AGC AAC AG AAC AAGT AAAT AAAGGTAAATT TG C CCT 
TaAACCaCAAGCCTTGCTAGATACTAATTGGCAGAAAGTAATTCTTcGTG 
AT AAAG AAAC ACAAGTTCGATT TACT At TGATTCT AGT CAATTT AGT CAG 
AAATTAAAAGAACAGATGGCAAATGGTTATTTCTTAGAAGGTTTTGTACG 
T T T T AAAG AAG C C AAGGAT AG T AAT C AGGAGT T AAT G AGT AT TCCTTTTG 
T AGGAT T T AAT GGT GAT T T T G C GAACT T AC AAG C AC T T GAAAC AC C GAT T 
T AT AAG ACG CT T T C T AAAG GT AG T T T C T AC TAT AAAC C AAAT GAT AC AAC 
T C AT AAAGAC C AAT T GG AGT AC AAT G AAT CAG CTCCTTTT G AAAG C AAC A 
AC TAT ACT G C C T T G T T AAC AC AAT CAG C GT CTTGGGGC T AT GT T GAT TAT 
GT C AAAAAT GGT G GG GAGT TAG AAT TAG C AC C GGAGAG T C C AAAAAG AAT 
TAT T T T AGG AACT T T T GAG AAT AAG G T T GAG GAT AAAAC AAT T CAT C T T T 
T GGAAAG AGAT G C AG CG AAT AAT C CAT AT T T T GC C AT T T C T C C AAAT AAA 
GAT GGAAAT AG G G ATGAAAT C ACT C C C C AGG C AAC T T T C T T AAG AAAT G T 
T AAGGAT ATTTCTGCTCAAGTTCT AGAT C AAAAT GGAAAT GTTATTTGGC 
AAAGT AAGGT TTTACC AT CTT ATCGTAAAAATTTC CAT AAT AAT CC AAAG 
C AAAGT GAT GGT CAT TAT C GT AT G GAT G C C CT T C AGT GG AG T GGT T T AGA 
T AAGGAT GG C AAAGT T G TAG CAG AT G GT T T T TAT AC T TAT C G T T TAG G T T 
AC AC AC C AGT AG C AG AAGGAG C AAAT AG T C AGG AGT CAG ACT T T AAAGT T 
C AAGT AAGT ACT AAGT C AC C AAAT CT T C C T T C AC GAG C T CAG T T T GAT G A 
AAC T AAT C G AAC AT T AAGC T TAG C C AT G C C T AAG G AAAGT AG T T AT G T T C 
C T AC AT AT CGT CT AC AAT T AGTT T TAT CT CAT GT T GT AAAAG AT G AAGAA 
TAT G GAG AT GAGACT T C T T AC CAT TAT T T C CAT AT AG AT C GAG AAG G T AA 
AGT G AC ACT T C C T AAAAC AGT T AAG AT AGG AG AG AG T G AG GT T G C AGT AG 

ACCCTAAGACCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCA 
AC G GT AAAAT T GT C T G AC CT CTT G AAT AAGG C AGT AGT AT CAG AG AAAG A 

AAACGCTATAGTAATTTCTAACAATTTCAAATATTTTGATAACTTGAAAA 
AAG AAC C T AT G T T TAT T T C T AAAG AAGG AAAAGT AGT AAAC AAG AAT C T A 
GAAG AAAT AG CAT T AGT T AAGC CG C AAACT AC AGT T ACT AC T C AAT CAT T 
GT CTAAAGAAAT AACT C AAT C AGG AAAT GAG AAAGT CCT CACT T CT ACAA 
AC AAT AAT AG TAG CAG AG TAG CT AAG AT CAT AT C AC C T AAAC AT AAC GG G 
GATTCTGTTAACCATACC 

SEQ ID NO. 4404 
STRAIN H36B 

G AGG AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T G T AAT T G C 
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T AAT GT T G CT C AAC AG C CAT C GC CAT C GGT AAC T AC T AAT AC T GT T GAAA 
AAACATCTGTAACATCTGCTTCTGCTAGTAATACAGCGAAAGAAATGGGT 
GAT AC AT CT GT AAAAAAT G AC AAAAC AG AAG AT G AAT TAT T AG AAG AG T T 
ATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGAAGAAGAAT 
AT C C CT CT AAAC C AGAG AC AAC C AAC AAT AAAGAAAG C AAT GT AGT AAC A 
AAT G CT T C AACT G C AAT AG C AC AGAAa GT T C CC T C AG CAT AT G AAGAGGT 
GAAGCCAGAAAGCAAGTCATCACTTGCTGTTCTTGATACATCTAAAATAA 
C AAAAT T G C AAGC C AT AAC C C AAAG AGG AAAGGG AAAT GT AGT AG C T AT T 
AT T GAT AC T G G C T T T GAT AT T AAC CAT GAT AT T T T T C G T T T AGAT AGC C C 
AAAAG AT GAT AAG C AC AG C T T T AAAAC T AAGG C AG AAT T T G AGG AAT T AA 
AAGC AAAACAT AAT AT C ACTT AT GGGAAAT GGGTTAACGAT AAGATT GTT 
T T T GC AC AT AACT ACGCC Aa C AAT AC AGAAAC GGT GG CT G AT AT T G C AG C 
AG C TAT G AAAG AT G GT T AT GGGT C AG AAG C AAAG AAT AT T T C GC AT GGT A 
C AC ACGT T G CT GGT AT T T T T GT AGG T AAT AGT AAAC G T C C AGC AAT C AAT 
GGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAATGCG 
TAT T C C AG AT AAAAT T GAT T CGG AC AAAT T T G GT GAAG CAT AT G C T AAAG 
CAATCACAGACGCTGTTAATCTAGGAGCAAAAACGATTAATATGAGCCTT 
GG AAAAAC AG CAG AT T C T T T AAT T G C T CT C AAT G AT AAAGT T AAAT T AGC 
ACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCCGGAA 
ATGAAGGTGCATTTGGTATGGATTATAGCAAACCATTATCAACTAATCCT 
G AC T AC GGT AC GGT T AAT AGT C CAG CT AT T T CT GAAG AT ACT T T GAG T GT 
T G C T AG CT AT GAAT C ACT T AAAACT AT C AGT G AGGT C GT T GAAAC AACT A 
TTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAACCTTtTGAC 
AAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGCAAAAAAAGA 
C T T T GAAGGT AAGG ACT T T AAAG GT AAGAT T G CAT T AAT T GAG C GT GGT G 
GT GG ACT T GAT T T TAT G AC T AAAAT C AC T CAT GC T AC AAAT G C AGG T GTT 
GTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTCTAAT 
TCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGATGGCGAGC 
G TAT AAAAAAT AC T T C AAG T CAG T T AAC AT T T AAC CAG AG T T T T G AAGT A 
GTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGGGGCGT 
GAC AGC T GAAG G AGC AAT C AAG C CT G AT GT AAC AG CTTCTGGCTTT G AAA 
T T T AT T C T T C AAC CT AT AAT AAT C AAT AC C AAAC AAT GT C T G GT AC AAG T 
AT G G C T T C AC C AC AT GT T G C AGG AT T AAT GAC AAT G C T T C AAAG T CAT T T 
GG C T GAGAAAT AT AAAG GG AT GAAT T T AGAT T C T AAAAAAT T G CT AG AAT 
T GT C T AAAAAC AT C C T CAT G AG CT C AG C AAC AG CAT TAT AT AG T GAAG AG 
GATAAGGCGTTTTATTCACCACGTCAGCAAGGTGCAGGTGTAGTTGATGC 
T G AAAAAG C TAT C C AAG C T C AAT AT TAT GT T AC T GG AAAC GAT G G C AAAG 
C T AAAAT T AAT C T C AAAC GAGT G GGAG AT AAAT TT G AT AT C AC AG T T AC A 
ATTCATAAACTTGTAGAAGGTGTCAAAGAATTGTATTATCAAGCTAATGT 
AGCAACAGAACAAGTAAATAAAGGTAAATTTGCCCTTAAACCaCAAGCCT 
TGCTAGATACTAATTGGCAGAAAGTAATTCTTCGTGATAAAGAAACACAA 
GTT CG AT T TACT AT T GAT T C T AGT C AAT T T AG T CAG AAAT T AAAAG AAC A 
G AT GGC AAAT GG T TAT T T C T T AGAAG GT T T T G t AC GT T T T AAAG AAGC C A 
AGGAT AGT AAT C AGG AGT T AAT GAGT AT T C CT T T T GT AG GAT T T AAT GGT 
GAT T T T G CG AACT t AC AAG C AC T T GAAAC AC C GAT T T AT AAGAC G CT T T C 
T AAAGGT AGT T T CT ACT AT AAACC AAAT GAT AC AACT CAT AAAG AC C AAT 
TGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTATACTGCCTTG 
T T AAC AC AAT CAG C GT CT T GGG GC T AT GT T GAT TAT GT C AAAAAT G GT G G 
G GAG T T Ag AAT T Ag C AC C GG AGAGT C C AAAAAG AAT TAT T T T AGGAAC T T 
T T G AG AAT AAGGT T GAG GAT AAAAC AAT T CAT CT T T T G GAAAG AG AT G C A 
G C GAAT AAT C CAT AT T T T G C CAT T T CT C C AAAT AAAG AT G G AAAT AG G G A 
TGAAATCACTCCCCAGGCAACTTTCTTAAGAAATGTTAAGGATATTTCTG 
CT C AAGT T CT AGAT C AAAAT GGAAATGTT AT TT GGC AAAGT AAGGT TTT A 
C CAT C T T AT C GT AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT GAT GGT C A 
T T AT C GT AT GG AT G C C C T T CAG T G G AGT G GTT TAG AT AAG GAT GGC AAAG 
TTGTAGCAGATGGTTTTTATACTTATCGTTTACGTTACACACCAGTAGCA 
GAAG GAG C AAAT AG T CAG GAG T CAG AC T T T AAAGT T C AAGT AAG T AC T AA 
G T C AC C AAAT C T T C C TT C AC G AG CT C AGT T T GAT G AAAC T AAT C G AAC AT 
TAAGCTTAGCCATGCCTAAGGAAAGTAGTTATGTTCCTACATATCGTCTA 
C AAT TAGT T TT AT CT C AT GT T GT AAAAGAT GAAGAAT AT GG AGAT GAG AC 
T T CT T AC CAT TAT T T C CAT AT AG AT C AAG AAG GT AAAG T G AC AC TT C C T A 
AAAC AG T T AAGAT AG GAG AG AGT G AGG T T G CAG TAG AC C C T AAG AC CT T G 
ACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAACGGTAAAATTGTC 
TGACCTCTTGAATAAGGCAGTAGTATCAGAGAAAGAAAACGCTATAGTAA 
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T T T CT AAC AAT T T C AAAT AT T T T GAT AACT T G AAAAAAGAAC C T AT GT T T 
ATTTCTAAAGAAGGAAAAGTAGTAAACAAGAATCTAGAAGAAATAGCATT 
AGT T AAG C CG C AAAC T AC AGT TAG TACT C AAT CAT T GT C T AAAG AAAT AA 

CTCAATCAGGAAATGAGAAAGTCCTCACTTCTACAAACAATAATAGTAGC 
AG AGT AG C T AAG AT CAT AT C AC C T AAAC AT AACG G G GAT T C T GT T AAC C A 
TACC 

SEQ ID NO. 4405 
STRAIN 18RS21 

GAGGAGC AAGAATTAAAAAAC C AAG AG C AAT C AC C 

T GT AAT T G C T AAT GT T G C T C AAC AG C CAT C G C CAT C G GT AAC T AC T AAT A 
CTGTTGAAAAAACATCTGTAACAGCTGCTTCTGCTAGTAATACAGCGAAA 
G AAAT GG GT GAT AC AT C T G T AAAAAAT GAC AAAAC AG AAGAT G AAT TAT T 
AG AAG AGT TAT CT AAAAAC C T T GAT AC G T C T AAT TTGGGGGCT GAT C T T G 
AAGAAGAAT AT C CCT CT AAAC CAGAGACAACC AAC AAT AAAG AAAGC AAT 
GT AGT AAC AAAT G CT T C AAC T G C AAT AG C AC AGAAAGT T C C C T C AGC AT A 
T G AAG AGGT G AAG C C AGAAAG C AAGT CAT C G C T T G CT GT T C T T GAT AC AT 
CTAAAATAACAAAATTACAAGCCATAACCCAAAGAGGAAAGGGAAATGTA 
GTAGCTATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTT 
AG AT AG C C C AAAAGAT GAT AAG C AC AG C T T T AAAAC T AAG AC AG AAT T T G 
AG GAAT T AAAAGC AAAAC AT AAT AT C AC T T AT G G G AAAT G GGT T AAC GAT 
AAGAT T GT T T T T G C AC AT AAC T AC G C C AAC AAT AC AGAAAC G GT GG C T GA 
TAT T G C AG C AG C T AT G AAAG AT GG T T AT GGT T C AGAAG C AAAG AAT AT T T 
CGCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCA 
GCAATCAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTT 
AT T AAT G C GT AT T C C AG AT AAAAT T GAT T CG GAC AAAT T T G G T G AAG CAT 

ATGCTAAAGCAATCACAGACGCTGTTAATCTAGGAGCAAAAACGATTAAT 
AT G AGT AT T GG AAAAAC AGC T GAT T C TT T AAT TG C T C T C AAT GAT AAAGT 
T AAAT T AG C AC T T AAAT T AG CT T CT G AGAAGGGC G T T G C AG TTGTTGTGG 
CTGCCGGAAATGAAGGCGCATTTGGTATGGATTATAGCAAACCATTATCA 
ACTAATCcTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATAC 
T T T GAG T GT T GC T AG CT AT GAAT C AC T T AAAACT AT C AG T G AG GT C GT T G 

AAACAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAA 
CCTTTTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGC 
AAAAAAAG AC T T T GAAGG T AAGG AC T T T AAAG GT AAG AT T G C AT T AAT T G 
AG C GT G GT GG T GG AC T T GAT T T T AT G AC T AAAAT C AC T CAT GCT AC AAAT 
GCAGGTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAA 
TTTTCTAATTCCTTACCGTGAATTACCTGTGGGGATTATTAgTAAAGTAG 
AT G G C G AGC G TAT AAAAAAT ACT T C AAGT C AGT T AAC AT T t AAC C Ag AG T 

TTTGAAGtAGTTGATAGCCAAGGTGGtAATCGTaTGCTGGAACAATCAAG 
TTGGGGCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTG 
GCTTTGAAATTTATTCTTCAACCTATAATAATCAATACCAAaCAATGTCT 
G GT AC AAGT AT G G C T T C AC C AC AT GT T G C AGGAT T AAT GAC AAT G CT T C A 

AAGTCATTTGGCTGAGAAATATAAAGGGATGAATTTAGATTCTAAAAAAT 
T G C T AGAAT T GT C T AAAAAC AT CCT CAT GAG C T C AG C AAC AG CAT TAT AT 
AGT GAAGAG G AT AAG GCG T T T T AT T C AC C AC GT C AG C AAG GT G C AGGT GT 
AGT T GAT GCT G AAAAAG CT AT C C AAG C T C a AT AT TAT AT T AC T GG AAAC G 

AT GGCAa AGCT AAAATT AAT CT CAAACG AAT GGGAGAT AAAT T TGAT AT C 
AC AGT T AC AAT T CAT a AACT T GT AG AAGG T G T C AAAG AAT T GT AT T AT C A 

AGCT AATGTAGCAAC AG AACAAGTAAATAAAGGT AAATTT GCC CTT a AAC 
C AC AAG C C T T G CT AGAT AC T AAT T GG C AG AAAGT AAT T C T T c G T GAT AAA 
G AAAC AC AAGT T C GAT T T AC T AT T GAT G C T AGT C AAT T TAG T C AG AAAT T 
AAAAG AAC AG AT G G C AAAT G G T T AT T T C T T Ag AAG GT T T T GT AC GT T T T A 
AAG AAG C C AAGG AT AGT AAT C AGGAG T T AAT GAG TAT T C C T T T T GT AG G A 
TTTAATGGTGATTTTGCGAACTTACAAGCACTTGAAACACCGATTTATAA 
GACGATTTCTAAAGGTAGTTTCTACTATAAACCAAATGATACAACTCATA 
AAGACCAATTGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTAT 
ACTGCCTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAA 
AAAT GGT GGG G AGT TAG AAT TAG C a C C G GAG AGT C C AAAAAG AAT TAT T T 
TAG G AAC T T T T G AGAAT AAG GT T GAG GAT AAAAC AAT T CAT CT T T T GG AA 
AG AG AT G C AG CG AAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAG AT GG 
AAAT AGGG AC G AAAT C AC T C C C C AG G C AAC t T T CT T AAG AAAT G T T AAGG 
AT AT T T CTGCT C AAGT TCT AG AT C AAAAT GG AAAT GTTATTTGGC AAAGT 
AAGGT T TT AC CAT C T T AT CGT AAAAAT T T C CAT AAT AAT C C AAAGC AAAG 
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TGATGGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAGATAAGG 
AT G G C AAAGT T GT AG C AGAT GGT T T T TAT AC T T AT CG C TT AC GT TAG AC A 
C C AGT AG C AG AAGG AG C AAAT AGT C AGGAGT C AG AC T T T AAAGT ACAAGT 
AAGT ACT AAGT C AC C AAAT C T T C C TT C AC GAGC T C AGT T T GAT G AAAC T A 
ATCGAACATTAAGCTTAGCCATGCCTAAGGAAAGTAGTTATGTTCCTACA 
TATCGTTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGG 
GG AT GAG AC T T CT T AC CAT TAT T T C CAT AT AGAT C AAGAAG GT AAAGT GA 
CACTTCCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCT 
AAGGCCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGcAACGGT 
AAAAT T GT C T GAT C T C T T G AAT AAG G C AGT AG TAT C AGAG AAAGAAAAC G 
C T AT AGT AAT T T C T AAC AG T T T C AAAT AT T T T GAT AAC T T G AAAAAAG AA 
C C T AT GT T T AT T T C T AAAAAAGAAAAAGT AG T AAAC AAGAAT C T AGAAGA 
AAT AAT AT T AGT T AAG C C G C AAACT AC AGT T AC T ACT CAAT CAT T GT C T A 
AAGAAAT AAC T AAAT C AG G AAAT GAG AAAGT C C T C ACT T CT AC AAAC AAT 

AATAGTAGCAGAGTAGCTAAGATCATATCACCTAAACATAACGGGGATTC 
TGTTAACCATACC 

SEQ ID NO. 4406 
STRAIN M732 

GAGGAGCAAGAATT AAAAAAC CAAGAG CAAT CAC CT 

GTAATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATAT 
T GT T G AAAAAAC AT C T GT AAC AG CTGCTTCTG C T AG T AAT AC AGT G AAAG 
AAAT G G GT G AT AC AT CT GT AAAAAAT GAC AAAAC AG AAGAT G AAT TAT T A 
GAAGAGTTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGA 
AG AAG AAT AT C CCT CT AAACCAGAGACAACCAAC AATAAAGAAAGCAAT G 
T AG T AAC AAAT G C T T C AAC T GC AAT AG CAC AG AAAGT T C C C T C AG CAT AT 

GAAGAGGTGAAGTCAGAAAGCAAGTCATCGCTTGCTGTTCTTGATACATC 
T AAAAT AAC AAAAT T AC AAG C CAC AAC C C AAAG AG GAAAGGG AAAT G TAG 
T AGC T AT T AT T GAT ACT G G C T T T GAT AT T AAC CAT GAT AT TTTTCGTTTA 
GAT AG C C C AAAAG AT GAT AAG CAC AG C T T T AAAAC T AAGG C AGAAT T T G A 
G G AAT T AAAAG C AAAAC AT AAT AT C ACT TAT G GG AAAT G G GT T AAC G AT A 
AGAT TGTTTTTG C AC AT AAC T ACG C C AAC AAT AC AG AAAC GG T GG CT GAT 
ATTGCAGCAGCTATGAAAGATGGTTATGGGTCAGAAGCAAAGAATATTTT 
GCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCAG 
CAATCAATAGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTA 
T T AAT G CGT AT T C C AG AT AAAAT T GAT T C GG AC AAAT T T G GAGAAG CAT A 
T G C T AAAG CAAT CAT AG AC G C T G T T AAT C T AG G AG C AAAAAC GAT T AAT A 
TGAGCCTGGGAAAAACGGCTGATTCTTTAATTGCTCTCAATGATAAAGTT 
AAAT TAG CAC T T AAAT T AG CT T C T GAGAAG G GC GT T G C AGT TGTTGTGGC 
T G C C GGAAAT G AAGGT G CAT T T GGT AT G GAT TAT AG C AAAC CAT TAT C AA 
CTAATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACT 
TTGAGTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGA 
AAC AAC TAT T G AAGGT AAGT TAG T T AAGT T G C C GAT T G T GAC T T CT AAAC 

CTTtTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGCA 
AAAAAGAT T T T G AAGGT AAG GAC T T T AAAG GT AAG AT T G CAT T AAT T GAG 
CGTGGTGGTG G ACT T GAT T T TAT G AC T AAAAT C ACT CAT G C T AC AAAT G C 
AG G T GT T GT T GGT AT CGT T AT TT T T AACG AT C AAGAAAAACGT G G AAAT T 
TTCTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGAT 
G G C GAG C GT AT AAAAAAT AC T T C AAGT C AGT T AAC AT T T AAC C AG AGT T T 
T G AAG T AGT T GAT AG C C AAG G T GGC AAT C GT AT GC T G GAAC AAT C AAGT T 
GGGGCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTGGC 
T T T G AAAT T T AT T CT T C AAC C TAT AAT AAT CAAT AC T AAAC AAT GT C T G G 
T AC AAGT AT G G C T T CAC CAC AT GT T G C AG GAT T AAT GAC AAT G C T T C AAA 
G T CAT T T G G C T GAG AAAT AT AAAGGG AT GAAT T TAG AT T C T AAAAAAT T G 
C TAG AAT T G T C T AAAAAC AT CCT CAT GAG C T C AG C AAC AG CAT TAT AT AG 
T G AAG AGG AT AAGG C GT T T T AT T CAC CAC GT C AG C AAGGT G C AGGT G T AG 
T T GAT G CT G AAAAAG C T AT C C AAG C T CAAT AT TAT G T T AC T GG AAACG AT 
G G C AAAGT T AAAAT T AAT C T C AAAC G AG AGGG AG AT AAAT T T GAT AT CAC 
AGTT AC AAT T CAT a AACTTGT AG AAGGT GT C AAAG AAT T GT AT T AT C AAG 
CTAATGTAGCAACAGAaCAAGTAAATAAAGGTAAATTTGCCCTTaAACCA 
C AAG C C T T G C T AGAT AC T AAT T GG C AG AAAG T AAT T C T T CGT GAT AAAG A 
AAC AC AAGT T C GAT T TACT AT T GAT G CT AGT CAAT T T AGT C AG AAAT T AA 
AAG AAC AG AT G G C AAAT GGT TAT T T CT TAG AAGG T T T T GT ACGT T T T AAA 
G AAG C C AAG GAT AG T AAT C AG G AGT T AAT G AGT AT T C C T TT T GT AGG AT T 
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T AAT GGT GAT T T T G C G AACT T AC AAG C AC T T GAAAC a C C GAT T T AT AAGA 
C GCT T T CT AAAGGT AGT T T C T ACT AT AAAC C AAAT G AT AC AAC T CAT AAA 
GACCAATT GGAGT ACAATGAAT CAGCT CCTT TTGAAAGCAAC AACT AT AC 
T G C C T T GT T AAC AC AAT CAGC G T C T T GGGG CT AT GT T GAT TAT GT C AAAA 
AT GGT G G GG AGTT AGAAT T AG C AC C G GAGAGT C C AAAAAGAAT TAT T T T A 
G G AACT T T T GAG AAT AAG GT T GAG GAT AAAAC AAT T CAT C T T T T G G AAAG 
AG AT G C AG C GAAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAG AT GG AA 
ATAGGGACGAAAT C ACT CCCC AGGCAACT TT CTTAAGAAATGTT AAGGAT 
ATTTCTGCTCAAGTTCTAGATCAAAATGGAAATGTTATTTGGCAAAGTAA 
GGT T T T AC CAT C T TAT C GT AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT G 
ATGGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAGATAAGGAT 
G GC AAAGT T GT AGC AG AT G GT T T T TAT AC T TAT C GCT T AC GT T AC AC AC C 
AG TAG C AGAAGG AG C a AAT AGT C AGG AGT C AG AC T T T AAAGT T C AAGT AA 
GT AC T AAGT C AC C AAAT C T T C CT T C AC GAG C T C AGT T T GAT G AAAC T AAT 
CG AAC AT T AAG C T TAG C CAT G C C T AAG GAAAG T AGT TAT GT T C C T AC AT A 
TCGTTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGG 
AT GAGACTT CTT ACCATT AT TT C CAT AT AGAT CAAGAAGGT AAAGTGACA 
C T T C C T AAAAC GGT T AAG AT AGG AG AG AGT GAG GT T G CGGT AGAC C CT AA 
GG C CT T GAC AC T T GT T GT GG AAG AT AAAG C T G GT AAT T T T GC AAC GGT AA 
AAT T GT CT GAC C T CT T G AAT AAGG C AG T AGT AT C AG AG AAAG a AAAC GCT 
AT AGT AAT T T C T AAC AGT T T C AAAT AT T T T GAT AAC T T G AAG AAAG AAC C 
TAT GT T T AT T T CT AAAG AAGGAAAAGT AGT AAAC AAG AAT C T AGAAG AAA 
T AAC AT T AGT T AAG C C T C AAACT AC AGT T ACT AC T C AAT CAT T GT CT AAA 
G AAAT AAC T AAAT C AG GAAAT G AGAAAGT C C T C ACT T C T AC AAAC AAT AA 
T AGT AGC AG AG T AG CT AAG AT CAT AT C AC C T AAAC AT AACG G GG AT T C T G 
TTAACCATACC 

SEQ ID NO. 4407 
STRAIN COH1 

GAGGAGCAAGAATTAAAAAACCAAGAGCAATCACCTGT 
AATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTaACTACTAATATTG 
TTGAAAAAACATCTGTAACAGCTGCTTCTGCTAGTAATACAGTGAAAGAA 
ATGGGt gAT AC ATCT GT AAAAAAT GACAAAACAGAAGAT GAAT TAT TAG A 
AG AGT TAT C T AAAAAC CTT G AT AC GT C T AAT TTGGGGGCT GAT CTT G AAG 
AAG AAT AT C C C T CT AAAC C AG AGa C AAC C AAC AAT AAAGAAAG C AAT GT A 
GT AAC AAAT GC T T C AAC T G C AAT AGC AC AGAAAGT T C C CT C AG CAT AT G A 
AGAGGTGAAGTCAGAAAGCAAGTCATCGCTTGCTGTTCTTGATACATCTA 
AAAT AAC AAAAT TAG AAG C C AC AAC C C AAAG AGG AAAGG GAAAT GT AGT A 
G C TAT TAT T GAT ACT G G CT T T GAT AT T AAC CAT GAT AT TTTTCGTT T AGA 
TAG C C C AAAAG AT G AT AAGC AC AG CTT T AAAAC T AAGG C AG AATT T GAG G 
AA t T AAAAG C AAAAC AT AAT AT C AC T TAT GGG AAAT GGGT T AAC GAT AAG 
AT T GT T T T T G C AC AT AACT AC G C C Aa C AAT AC AG AAAC GG T GG CT GAT AT 
T GC AG C AG C TAT GAAAG AT G GT TAT GGGT C AG AAG C AAAG AAT AT T T T G C 
ATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCAGCA 
AT C AAT AGT C T T CT T T T AG AAGGT G C AG C G C C AAAT G CT C AAG T C T T AT T 
AAT G CGT AT T CC AG AT AAAAT T GAT T C GG AC AAAT T T G GAG AAGC AT AT G 
CT AAAG C AAT CAT AGAC G C T GT T AAT C T AG G AG C AAAAAC GAT T AAT AT G 
AGCCTGGGAAAAACGGCTGATTCTTTAATTGCTCTCAATGATAAAGTTAA 
AT TAG C AC T T AAAT TAG CTT CT G AG AAG GGCGT T G C AGT T GT T GT GG C T G 
C C G GAAAT GAAGG T G CAT T T GGT AT GG AT TAT AG C AAAC CAT TAT C AAC T 
AAT C C T G AC T AC GGT ACGGT T AAT AGT C CAGC TAT T T C T G AAG AT AC T T T 
GAGT GT T G C T AG CT AT GAAT C AC T T AAAAC TAT C AGT G AG GT C GT T G AAA 
CAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAACCT 
TtTGACAAAGGTAAGGCCTACGATGT GGT TTATGCC AATT ATGGTGCAAA 
AAAG AT T T T G AAGG T AAGG ACT T T AAAGG T AAG AT T GC AT T AAT T GAG C G 
TGGTGGTG G AC T T GAT T T T AT G AC T AAAAT C ACT CAT GCT AC AAAT G C AG 
GTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTT 
C TAAT T C CT T AC C GT GAAT TAG CTGT GGG GGT TAT TAGTAAAGT AGAT GG 
CG AG CGT AT AAAAAAT ACT T C AAGT C AG T T AAC AT T T AAC C AGAG T T T T G 

AAGTAGTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGG 
GGCGT GAC AGCTGAAGGAGC AAT CAAGCCTGATGT AAC AG CTT CTGGCTT 
TGAaATTTATTCTTCAACCTATAATAATCAATACTAAACAATGTCTGGTA 
C AAGT AT G G CT T C AC C AC AT GT T G C AGG AT T AAT GAC AAT G CT T C AAAGT 
CAT T T GG C T GAG AAAT AT AAAG G GAT GAAT T TAG AT T CTAa AAAAT T G C T 
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AGaATTGTCTAaaAACATCCTCATGAGCTCAGCAACAGCATTATATAGTG 
AAGAGG AT AAGG CGT T T T AT T C AC C AC GT C AG C AAG GT G GAG G T GT AGT T 
GAT G C T G AAAAAG CT AT CC AAG C T C AAT AT TAT GT T AC T GGAAACGAT GG 
C AAAGT T AAAAT T AAT CT C AAAC G AG AG GGAGAT AAAT T T GAT AT C AC AG 
TTACAATTCATaAACTTGTAGAAGGTGTCAAAGAATTGTATTATCAAGCT 
AAT GT AG C Aa C AG AAC AAG T AAAT AAAG GT AAAT T T GC C C T T AAAC C AC A 
AG C C T T G CT AGAT AC T AAT T G G C AG AAAG T AAT T CT T c GT GAT AAAGAAA 
C AC AAGT T C GAT T TACT AT T G AT GC T AGT C AAT T T AGT C AG AAAT T AAAA 
GAAC AGAT GG C AAAT G GT T AT T T C T T AG AAGGT TT T G T ACGT T T T AAAG A 
AG C C AAGGAT AG T AAT C AGG AG T T AAT GAGT AT T C CT T T T G T AG GAT T T A 
AT G GT GAT T T T G C GAAC T T AC AAG C AC T T G AAAC AC C GAT T TAT AAG AC G 
C T T T C T AAAGG T AGT T T C TACT AT AAAC C AAAT GAT AC AAC T CAT AAAG A 
C C AAT T GG AGT AC AAT G AAT C AG CTCCTTTT G AAAG C AAC AACT AT AC T G 
CCTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAAT 
GGTGGGGAGTTAGAATTAGCACCGGAGAGTCCAAAAAGAATTATTTTAGG 
a AC T T T T GAG AAT AAG G T T GAG G AT AAAAC AAT T CAT C T T T T GG AAAG AG 
AT G C AG C GAAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAG AT G G AAAT 
AGGGACG AAAT C AC T C C C C AG G C a AC T T T CT T AAGAAAT G T T AAGG AT AT 

T TCTGCTC AAG tTCT AGAT CAAAATGGAAATGTT AT TTGGCAAAGT AAGG 
T T T T AC CAT C T TAT CGT AAAAAT T T CC AT AAT a AT C C AAAG C AAAG T GAT 

GGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAgATAAGGATGG 
C AAAG T T GT Ag C AG AT GG t T T T TAT AC T TAT CG C T T ACGT T AC AC AC C AG 
TAG C AG AAGG AG C AAAT AGT C AG GAGT C AG AC T T T a AAGT T C AAGT AAGT 

AcTAAGTCACCAAATCTTCCTTCACGAGCTCAGTTTGATGaAACTAATCG 
AAC AT T AAGC T T AG C CAT G C C T AAGG AAAG T AGT TAT GT T C C T AC AT AT C 
G T T T AC AAT T AGT T T TAT C T CAT GT T GT AAAAG AT G AAG AAT AT G G G GAT 
GAG ACT T CT TAG CAT TAT T T CCAT AT AGAT CAAGAAGGT AAAGT G AC ACT 
T C CT AAAAC G GT T AAG AT AG GAGAG AGT G AGG T T G C G GT AG AC C C T AAGG 
CCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTAAAA 
T T GT C T G AC C T C T T GAAT AAG G C AGT AGT AT C AG AG AAAGAAAAC G C T AT 
AGT AAT T T C T AAC AG T T T C AAAT AT T T T GAT AAC T T GAAGAAAG AAC C T A 
T G T T TAT T T C T AAAG AAG G AAAAG TAG T AAAC AAGAAT C TAG AAG AAAT A 
AC AT T AGT T AAG C C T C AAAC T AC AG T T AC T AC T C AAT CAT T GT CT AAAG A 
AAT AACT AAAT C AGGAAAT GAG AAAGT C C T C AC T T C T AC AAAC AAT AAT A 
G T AG C AG AGT AG C T AAG AT CAT AT C AC CT AAAC AT AAC G G G G AT T CT G T T 
AACCATACC 

♦ 

SEQ ID NO. 4408 
STRAIN M781 

GAG GAG C AAG AAT T AAAAAAC C AAG AG C AAT C AC CT GT 

AATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATATTG 
T T G AAAAAAC AT C T GT AAC AG CT G C T T C T GC T AGT AAT AC AGT G AAAG AA 

ATGGGTGATACATCTGTAAAAAATGACAAAACAGAAGATGAATTATTAGA 
AG AG T TAT CT AAAAAC C T T GAT AC G T C T AAT TTGGGGGCT GAT C T T GAAG 
AAG AAT AT C C CT C T AAAC C AG AG AC AAC C AAC AAT AAAGAAAG C AAT G T A 
GT AAC AAAT G C T T C AAC T G C AAT AG C AC AG AAAG T T C C C T C AGC AT AT G A 
AG AG G T GAAG T C AG AAAG C AAGT CAT CGCTTGCTGTTCTT GAT AC AT C T A 
AAAT AAC AAAAT T AC AAG C C AC AAC C C AAAGAGG AAAGGG AAAT GT AG T A 

GCTATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTTAGA 
TAG C C C AAAAGAT GAT AAG C AC AG C T T T AAAACT AAG G C AG AAT T T GAGG 

AATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAACGATAAG 
ATTGTTTTTGCACATAACTACGCCAaCAATACAGAAACGGTGGCTGATAT 
T G C AG C AG C T AT G AAAG AT GG T TAT GG G T C AG AAG C AAAG AAT AT T T T G C 
AT G GT AC AC AC GT T G CT GGT AT T T T T G T AG GT AAT AG T AAACGT C C AG C A 

ATCAATAGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATT 
AAT G CGT AT T C C AG AT AAAAT T GAT T C G G AC AAAT T T G GAG AAG CAT AT G 
C T AAAGC AAT CAT AG AC G C T G T T AAT C T AG G AG C AAAAAC GAT T AAT AT G 
AGC CTGGGAAAAACGGCTGATTCTTTAATTGCTCTCAATG AT AAAGT TAA 
ATTAGCACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTG 
C C G G AAAT G AAGGT G CAT T T G G TAT GG AT TAT AG C AAa C CAT TAT C Aa C T 

AATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACTTT 
GAGT GTTGCTAGCT AT GAAT CACTt AAAACT AT C AGT GAGGT CGT TGAAA 
CAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACtTCTAaACCT 
TTTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGCAAA 
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AAAG AT T T T G AAG GT AAGG ACT T T AAAG GTAAGAT T G C AT T AAT T GAG C G 
TGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACAAATGCAG 
GTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTT 
cTAATTCCTTACCGTGAATTACCTGTGgGGGTTATTAGTAAAGTAGATGG 
CG AG C G T AT AAAAAAT ACT T C AAGT C AG T T AAC AT T T AAC C AG AGT T TT g 

AAGTAGTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGG 
G G C GT G AC AG CT G AAG G AGC AAT C AAG C C T GAT GT AAC AG CTTCTGGCTT 
T G AAAT T TAT T C T T C AAC CT A T AAT AAT C AAT ACT AAAC AAT GT CT GGT A 
C AAG TAT G GC T T C AC C AC AT G T T G CAGG AT T AAT G AC AAT G CT T C AAAGT 
CAT T T GG CT GAG AAAT AT AAAG G GAT G AAT T TAG AT T C T AAAAAAT T GC T 
AG AAT T GT CT AAAAAC AT C CT CAT G AGCT C AG C AAC AG CAT TAT AT AGT G 
AAGAGGATAAGGCGTTTTATTCACCACGTCAGCAAGGTGCAGGTGTAGTT 
GATGCTGAAAAAGCTATCCAAGCTCAATATTATGTTACTGGAAACGATGG 
C AAAGT TAAAAT T AAT C T C AAAC GAG AG G GAG AT AAAT T T GAT AT CACAG 
T T AC AAT T CAT a a AC T T GT Ag AAGGT GT C AAAG AAT T GT AT TAT C AAG C T 
AAT GT AGC a a C AG AAC AAGT AAAT AaAGGT AAAT TTGCCCTTaAaCCa C A 
AG C CT T GCT AG AT AC T AAT T G GC AG A a AGT a AT T C T T cGT GAT AAAG AAA 
C AC AAGT T c GAT T TACT At T GAT G CT AG T C AAT T T AGT C AG AAAT T AAAA 
GAACAGATGGCAAATGGTTATTTCTTAGAAGGTTTTGTACGTTTTAAAGA 
AGCCAAGGATAGTAATCAGGAGTTAATGAGTATTCCTTTTGTAGGATTTA 
AT GGT GAT T T T GC G AAC T t AC AAG C ACT T G AAAC ACC GAT T TAT AAG AC G 
C T T T C T AAAGGT AGT T T CT AC T AT AAa C C AAAT GAT AC AAC T C AT AAAGA 
CCAATTGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTATACTG 
CCTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAAT 
GG T GG G G AGT T AGAAT T AG C AC C GG AG AGT C C AAAAAG AAT TAT T T T AGG 
AAC T T T T GAGAAT AAG GT T GAG G AT AAAAC AAT T CAT C T T T T GG AAAG AG 

ATG C AGCGAAT AAT CC AT ATT TTGCCATTTCTCC AAAT AAAG ATGG AAAT 
AG G G AC G a a AT C ACT C C C C AGG C a AC t T T C T T AAG AAAT GT T AAGG AT AT 
T T CT G CT C AAG t T CT AG AT C AAAAT GG AAAT GT TAT T T GG C AAAGT AAG G 
T T T T AC CAT C T TAT C GT AAAAAT T T C CAT AAT a AT C C AAAG C AAAGT GAT 
GGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAGATAAGGATGG 
CAAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTACACACCAG 
T AGCAGAAGGAGCAAAT AGT CAGG AGT CAGACTTT AAAGT T C AAGT AAGT 
AC T AAG T C AC C AAAT C T T C CT T C AC GAGC T C AGT T T GAT G AAACT AAT C G 
AAC AT T AAG C T T AGC CAT G C C T AAGG AAAGT AGT TAT G T T C C T AC At AT C 
GTTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGGAT 
GAGACTTCTTACCATTATTTCCATATAGATCAAGAAGGTAAAGTGACACT 
TCCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCTAAGG 
CCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTAAAA 
T T G T C T G AC C T CT T GAAT AAG G C AGT AG TAT C AG AG AAAG AAAAC GCT AT 
AGT AAT T T C T AAC AG T T T C AAAT AT T T T GAT AAC T T G AAGAAAGAAC C T A 
T GT T T AT T TCT AAAG AAGG AAAAGT AGT AAAC AAGAATCTAGAAG AAAT A 
AC AT TAG T T AAG C C T C AAACT AC AGT TACT ACT C AAT CAT T G T CT AAAGA 
AAT AAC T AAAT CAGG AAAT GAG AAAGT C C T C AC T T C T AC AAAC AAT AAT A 
GT AG C AGAGT AG C T AAG AT CAT AT C AC C T AAAC AT AAC GGGG AT T CT G T T 
AAC CAT ACC 

SEQ ID NO. 4409 
STRAIN CJB110 

GAG G AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T G T AA 

TTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATATTGTT 
G AAAAAAC AT C T GT An C AG CT G CT T CT G C T AGT AAT AC AG C G AAAG AAAT 
GG GT GAT AC AT C TGT AAAAAAT G AC AAAAC AG AAG AT GAAT TAT TAG AAG 
AGT TAT C T AAAAAC C T T G AT AC GT C T AAT wT G G GGG CT G AT C T T G AAG AA 
GAAT AT C C CT CT AAAC CAGAGACAACC AAC AAT AAAG AAAG C AAT GT AGT 
AAC AAAT G C T T C AACT G C AAT AG CACAG AAAGT T C C C T C AG C GT AT G AAG 
AGGT G a AG C C AG AAAG C AAG T CAT CGCTTGCTGTTTTT GAT AC AT C T AAA 
AT AAC AAAAT T G C AAG C CAT AAC C C AAAG AGG AAAGG G AAAT G T AGT AG C 
TATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTTAGATA 
GC C C AAAAG AT GAT AAG CACAG CT T T AAAAC T AAAG C AG AAT T C G AGG AA 
t T AAAAG C AAAAC AT AAT AT C ACT TAT G G G AAAT GG GT T AAC GAT AAG AT 
TGTTTTTGCACATAACTACGCCAACAATACAGAAACGGTGGCTGATATTG 
CAGCAGCTATGAAAGATGGTTATGGGTCAGAAGCAAAGAATATTTCGCAT 
GGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCAGCAAT 
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CAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAA 
T G CGT AT T C C AGAT AAAAT T GAT T CGGAC AAAT T TGG AGAAGC AT AT G C T 
AAAG C AAT C AC AGACGC T G TT AAT C TAG GAG C AAAAAC GAT T AAT AT GAG 
C CTT GGAAAAAC AG C AGAT T C TT T AAT T G C AC T C AAT G AT AAAGT T AAAT 

TAgCACTTAAATTAGCTTcTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCC 
GG AAAT GAAG GT G CAT T T G GT AT GGAT T AT Ag C AAAC CAT TAT C AACT AA 

TcCTGACTACGGtACGGTTAATAGTCCAGCTATTTcTGAAGATACTTTGA 
GTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGaAACA 
ACT AT T GAAGGT AAG T T AGT T AAGT T G C CG AT T GT G AC T T c T AAAC C T T T 
T GAC AAAGGT AAGG C CT AC GAT GT GGT T TAT G C C AAT TAT G G T G C AAAAA 
AAGACT T TG AAGGT AAG G AC T T T AAAGGT AAG AT T G C AT T AAT T G AGC GT 
GGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACAAATGCAGG 
TGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTc 
T AAT T CCT T AC CGT GAAT T AC CTGTGgGGGT TAT T AGT AAAGT AGAT GGC 
G AGC G T AT AAAAAAT ACT T C AAGT C AGT T AAC AT T T AAC C AgAGT T T T G A 
AGT AgT T GAT AG C C AAg GT G G C AAT CGT AT G CT GG AAC AAT C AAGT t G GG 

GCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTGGCTTT 
GAAAT T TAT T CT T C AAC C TAT AAT AAT C AAT AC C AAAC AAT GT C T G G T AC 
AAGT AT G G C T T C AC C AC AT G t T G C AG GAT T AAT GAC AAT G CT T C AAAAT C 
AT T T GGCT GAG AAAT AT AAAGG GAT GAAT T TAG ATT C T AAAAAAT T G C T A 
GAAT T GT C T AAAAAC AT CCT CAT GAG C T C AG C AAC AG CAT TAT AT AGT GA 
AGAGGATAAGGCGTTTTATTCACCACGTCAGCAAGGtGCAGGTGTAGTTG 
AT GCT GAAAAAG C TAT C C AAG C T C AAT AT T AT GT T ACT GGAAACG AT GGC 
AAAGCTAAAATT AAT CT CAAACGAGTGGGAGATAAAT T TGATAT C AC AGT 
T AC AAT T CAT AAAC T T GT AG AAGGT GT C AAAGAAT T GT AT TAT C AAG CT A 
AT GT AGC AAC AGAACAAG T AAAT AAAGGT AAAT TTGCCCTT a AAC C AC AA 
G C CT T G C TAG AT ACT AAT T GG C AG AAAGT AAT T C T T c G T GAT AAAG AAAC 
AC AAGT T C GAT T T AC T A t T GAT G CT AGT C AAT T T Ag T C AG AAAT T AAAAG 

AACAGATGGCAAAT GGT TAT TT CTT AgAAGGTTTTGTACGTTTT AAAG AA 
G C C AAG GAT AGT AAT C AGG AGT T AAT GAGT AT TCCTTTTG TAG GAT T T AA 
T G GT GAT T T T G C G AAC T t AC AAG C AC T T G AAAC AC CG AT T TAT AAG AC G C 
T T T CT AAAGGT AGT t T C T AC TAT AAAC C AAAT GAT AC AAC T CAT AAAG AC 
C AAT T GG AGT AC AAT GAAT C AG CT C c t T T T G AAAG C AAC AAC TAT ACT G C 
CTT GT T AAC AC AAT C AG CGT CT T GGG GC T AT GT T GAT TAT G T C AAAAAT G 

GTGGGGAGTTAGAATTAGCACCGGAGAGTCCAAAAAGAATTATTTTAGGA 
ACT T T T GAG AAT AAG G T T G AGG AT AAAAC AAT T CAT C T T T T G G AAAG AG A 
T G C AG CGAAT AAT C CAT AT T T T G C C AT T T CT C C AAAT AAAG AT GG AAAT A 
GGGATGaaATCACTCCCCAGGCAACtTTCTTAAGAAATGTTAAGGATATT 
TCTGCTCAAGTTCTAGATCAAAATGGAAATGTTATTTGGCAAAGTAAGGT 
T T T AC CAT C T TAT C G T AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT GAT G 
GT CAT TAT C GT AT G GAT G C C T T T C AGT G GAGT GGT T T Ag AT AAgG AT GGC 
AAAGT T GT AG C AG AT GGT T T T TAT AC T TAT CG C C T AC GT T AC AC AC C AGT 
AG C AGAAgG AG C AAAT AGT C AGG AGT C Ag ACT T T AAAGT T C AAGT AAGT A 
CT AAGT C AC C AAAT C T T C CT T T AC T AG CT C AGT T T GAT G AAAC T AAT C GA 
AC AT T AAG CT TAG C C ATG C C T AAG G AAAGT AGT T AT GT T C C T AC AT AT C G 

TTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGGATG 
AGAC T T C T T AC CAT TAT T T C CAT AT AGAT C AAGAAGG T AAAG T GAC AC T T 
C CT AAAAC GGT T AAG AT AGG AG AGAG T GAG GT T G C AGT AG AC C CT AAGG C 
CTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTaAAAT 
T GT C T G AC CT CT T G Aa T AAg GC AGT AGT AT C AG AG AAAG AAAAC G C TATA 
GT AAT T T CT AAC AGT T T C AAAT AT T T T GAT AACT T GAAAAAAG AAT C TAT 
GT T TAT T T C T AAAG AAGG AAAAG T AGT AAAC AAG AAT CT AG AAGAAAT AA 
CAT T AGT T AAG C C GC Aa AC T AC AGT T AC T AC T C AAT CAT T GT C T AAAG AA 

ATAACTAAATCAGGAAATGAGAAAGTCCTCACTTCTACAAACAATAATAG 
TAG C AG AGT AG C T AAG AT CAT AT C AC C T AAAC AT AAC G G GG AT T C T GT T A 
ACCATACC 

SEQ ID NO. 4410 
STRAIN 1169NT 

G AGGAG C AAG AAT T AAAAAAC C AAG AG C AAT C 

ACCTGTAATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTA 
ATATTGTTGAAAAAACATCTGTAACAGCTGCTTCTGCTAGTAATACAGCG 
AAAG AAAT G GGT GAT AC AT C T GT AAAAAAT GAC AAAAC AG AAG AT GAAT T 

ATTAGAAGAGTTATCTAAAAACCTTGATACGTCTAATATGGGGGCTGATC 
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T T G AAG AAGAAT AT C C C T C T AAAC C AG AGAC AAC C AAC AAT AAGGAAAG C 
AATGT AGTAACAAATGCTT CAACTGCAAT AGCACAGAAAGTTCCCT CAGC 
ATATGAAGAGGTGAAGCCAAAAAGCAAGTCATCGCTTGCTGTTCTTGATA 
C AT CT AAAAT AAC AAAATT G C AAGC C AT AACC C AAAG AGGAAAGG GAAAT 
GT AGT AGCT AT TAT T GAT AC T GG C T T T GAT AT T AAC CAT GAT AT T T T T CG 
T T T AGAT AG C C C AAAAG AT GAT AAGC AC AG CT T T AAAAAT AAGG C AG AAT 
TCGAGGAATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAAC 
GATAAGATTGTTTTTGCACATAACTACGCCAACAATACAGAAACGGTGGC 
TGATATTGCAGCAGCTATGAAAgATGGTTATGGTTCAGAAGCAAAGAATA 
T T T C G CAT GGT AC AC AC GT T GCT G GT AT T t T T GT AGGT AAT AGT AAAC GT 
CCAGCAATCAATGGTCTTCTTTTAgAAGGTGCAgCGCCAAATGCTCAAGT 
C T TAT T AAT G C GT AT T C C AG AT AAAAT t GAT T CG G AC AAAT T t G G AGAAG 
CAT AT GC T AAAG C AAT C AC AGAC G CT GT T AAT C TAG GAG CT a AAAC GAT T 
AAT AT G AGT AT T G GAAAAAC AG CT GAT T C T T T AAT T GC T C T C AAT G AT AA 
AGTTAAATTAgC ACT T AAAT T AGCT TCTGAGAAGGGCGTTGCAGTTGTTG 
TGGCTGcCGGAAATGAAGGCGCATTtGGTATGGATTATAGCAAACCGTTA 
T C AACT AAT c CT G ACT AC GG t ACGG t T AAT AGT C C AG CT AT T T C T G AAG A 
TACTTTGAGTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCG 
T T GAAACAAC T AT T GAAGGT AAG T TAG T T AAGT t GC C GAT T G t G AC T T C T 
AAACCTTttGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGG 
T GC AAAAAAAG AC T T T GAAGGT AAGG ACT T T AAAG GT AAG AT T G C AT T AA 
TTGAGCGTGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACA 
AAT G C AGGT GT TGT T G GT AT C GT T AT T T T T AACG AT C AAG AAAAAC GT G G 
AAATTTTCTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAG 
T AGAT G G CG AG CGT AT AAAAa AT ACT T C AAGT C AGT T AAC AT T T AAC C Ag 
AGAT T T G AAGT AGT T GAT AG C C AAg GT G G C AAT C GT AT G CT GG AAC AAT C 
a AGT t GGGG C GT GAC AGC T G AAGG AGC AAT C AAG C CT G AT G T AAC AGCT T 
C T GG C T T CG a AAT T TAT T CT T C a a CCT AT AAT AAT C AAT AC C AAAC AAT G 
TCTGGTACAAGTATGGCTTCACCACATGTTGCAGGATTAATGACAATGCT 
T C AAAGT CAT T T GG C T GAG a AAT AT AAAG GG AT G AAT T T Ag AT T CT Aa AA 
AAT T GCT AGAAT TGT CT AAAAAC AT CCT CAT GAG CT C AG C AAC AG C AT T A 
TAT AGT G AAG AGG AT AAGG C GT T T TAT T C AC C AC GT CAGC AAGG t G C AGG 
TGTAGTTGATGCTGAAAAAGCTATCCAAGCTCAATATTATGTTACTGGAA 
AC GAT G G C AAAG C T AAAAT T AAT C T CAAACG AGT G GG AG AT AAAT T T GAT 
AT CAC AGTTACAATT C AT AAACTT GT AGAAGGTGT CAAAGAATT GTATT A 
T C AAG C T AAT GT AGC AAC AG AAC AAGT AAAT AAAGGT AAAT TTGCCCTTA 
AACCACAAGCCTTGCTAGATACTAATTGGCAGAAAGTAATTCTTcGTGAT 
AAAGAAAC ACAAGTT CGATTT ACT ATT GATGCT AGT CAATTTAgT CAGAA 
ATTAAAAGAACAGATGGCAAATGGTTATTTCTTAgAAGGTTTTGTACGTT 
TTAAAGAAGCTAAGGATAGTAATCAGGAGTTAATGAGTATTCCTTTTGTA 
G GAT T T AAT GGT GAT T T T G CG AGC T T AC AAG CAC T T G AAAC AC C GAT T T A 
T AAG ACGCTTTCT AAAGGT AGT TTCTACTATAAACCAAATGATACAACTC 
AT AAAGACCAATTGGAGT AT AAT GAAT CAGCT C CTT TTGAAAGC AAC AAC 
TAT AC T G C C T T GT T AAC AC AAT C AG CGTCTTGGGGC TAT GT T G AT T AT GT 
C a AAAAT G GT GGGG AGT TAG AAT TAG CAC C G G AGAG T c C AAAAAG AAT T A 
TTTTAGGAACTTTTGAGAATAAGGTTGAGGATAAAACAATTCATCTTTTG 
G AAAGAG AT G CAGC GAAT AAT C CAT AT T TT G C CAT T T C T C C AAAT AAAG A 
T GG AAAT AGG GAT GAAAT C ACT C C C C AGG C AAC T T T C T T AAGAAAT GT T A 
AGGAT AT TT CTGCT C AAGTT CT AGAT CAAAAT GGAAAT GTT AT T T GGCAA 
AGTAAGGTTTTACCATCTTATCGTAAAAATTTCCATAATAATCCAAAGCA 
GAGTGATGGTCATTATCGTATGGATGCCCTTCAGTGGAGTGGTTTAgATA 
AGGATGGCAAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTAC 
AC AC C AGT AG C AG AAG G AGC AAAT AGT C AGG AG T C AG ACT T T AAAG T T C A 
AGT AAGT ACT AAGT C ACC AAAT CTTCCTTCACG AGCT CAGTTTGATGaAA 
C T AAT C G AAC AT T AAG CT T AGC CAT G C C T AAGG G AAG T AGT T AT GT T C C T 
AT ATAT CGT CT AC AATT AGT TTT AT CT CATGTT GTAAAAGAT G AAGAAT A 
TGGAG AT GAGACTT CTTACT AT T ATT T C CAT AT AGAT C AAGAAGGTAAAG 
C GAC AC T T C C T AAAAC GGT T AAG AT AG GAG AG AG T G AGGT T GC AGT AG AC 
CCTAAGGCCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAaC 
G GT AAAAT T GT C T GAC C T C T T GAAT AAG G C AGT AGT AT C AG AG AAAG AAA 
ACGCTATAGTAATTTCTAACAGTTTCAAATATTTTGATAACTTGAAAAAA 
GAAC CT AT GT T TAT T T C T AAAAAAG AAAAAGT AG T AAAC AAG AAT CT AG A 
AGAaATAATATTAGTTAAGCCGCAcACTACAGTTACTACTCAaTCATTGT 
CT AAAG AAAT AACT AAAT C AGG AAATG AGAAAGT CCT C ACT T CT AC AAAC 
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AAT AAT AGT AGT AGAGT AG C T AAAAT CAT AT C AC C T AAAC AT AAT GGGGA 
TTCTGTTAACCATACC 

SEQ ID NO . 4411 
STRAIN JM9130013 

G AGG AG C AAGAAT T AAAAAAC CAAG AG C AAT CAC CT GT AA 

TTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATACTGTT 
GAAAAAAC AT CT GT AAC AG CT G CT T CT G C T AGT AAT AC AGCG AAAGAAAT 
GGGTGAT ACAT CT GT AAAAAATGACAAAAC AGAAGATGAATTATT AGAAG 
AGTTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGAAGAA 
G AAT AT C C C T C T AAAC C AGAG AC AAC C AAC AAT AAAG AAAGC AAT G T AG T 
AAC AAAT G C T T C AACT G C AAT AG C AC AG AAAGT T CC C T C AG CAT AT G AAG 

AGGTGAAGCCAGAAAGCAAGTCATCGCTTGCTGTTCTTGATACATCTAAA 
ATAACAAAAT TACAAGCCATAACC CAAAGAGGAAAGGGAAAT GTAGTAGC 
TAT TAT T GAT ACT G G C T T T GAT AT T AAC CAT GAT AT T T T T C G T T TAG AT A 
GC C C AAAAGAT GAT AAG C AC AGCT T T AAAACT AAG AC AG AAT T T GAGG AA 
T T AAAAG C AAAAC AT AAT AT C AC T TAT GGG AAAT GGGT T AAC GAT AAGAT 
TGTTTTTGCACATAACTACGCCAACAATACAGAAACGGTGGCTGATATTG 
CAGCAGCTATGAAAGATGGTTATGGTTCAGAAGCAAAGAATATTTCGCAT 
G GT AC AC AC GT T GC T G GT AT T T T T GTAGGT AAT AGT AAAC GT C C AGC AAT 
C AAT GGT C T T C TT T T AG AAG GT G C AGC G C C AAAT G CT C AAGT C T TAT T AA 
T G C GT AT T C C AG AT AAAAT T GAT T C GGAC AAAT T T GGT G AAG CAT AT G C T 
AAAGC AAT CAC AG AC G C T GT T AAT CT AG GAGC AAAAACG AT T AAT AT GAG 
TAT T G G AAAAAC AG C T GAT T CT T T AAT T GCT CT C AAT GAT AAAGT T AAAT 

TAGCACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCC 
G GAAAT GAAGG C G CAT T T G GT ATGG AT T AT AGCAAAC CAT TAT C AAC T AA 
T CC T GAC T AC G GT AC GGT T AAT AG T CC AG C TAT T T C T G AAG AT AC T T T G A 
GT GT T GCT AGC TAT GAAT CAC T T AAAAC TAT C AGT G AGGT C GT T G AAAC A 
ACT AT T GAAGGT AAGT TAG T T AAGT T G CC G AT T G T GAC T T C T AAAC CT T T 

TGACAAAgGTAAgGCCTACGATGTGGTTTATGCCAATTATGGTGCAAAAA 
AAG ACT T T GAAGGT AAG GAC T T T AAAG GT AAG AT T G CAT T AAT T GAG C GT 
GGT GGT GGAC T T GAT T T TAT G AC T AAAAT C ACT CAT G C T AC AAAT GC AGG 
TGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTC 
T AAT TCCTTACCGTG AAT TACCTGTGGGG ATT ATT AGT AAAGT AGATGGC 
G AGCGT AT AAAAAAT ACT T CAAG T C AGT T AAC AT T T AAC C AG AG T T T T GA 
AGT AGT T GAT AG C C AAGG T G GT AAT C GT AT G C T G G AAC AAT CAAG T T GGG 

GCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTGGCTTT 
G AAATTT AT T CT T C AACCT AT AAT AAT CAAT AC C AAACAAT GT CT GGT AC 
AAGT AT G GCT T CAC C AC AT GT T GC AGG AT T AAT GAC AAT G C T T C AAAGT C 
AT T T G GC T G AGAAAT AT AAAGGG a T GAAT T T AGAT T C T AAAAAAT T GC T A 

GAAT TGT CTAAAAACAT CCT CATGAGCT CAGCAACAGCATT AT AT AGT GA 
AG AGG AT AAG GC GT T T T AT T CAC CAC GT C AG C AAGGT GC AGG T GT AGT T G 

ATGCTGAAAAAGCTATCCAAGCTCaATATTATATTACTGGAAACGATGGC 
AAAG C T AAAAT T AAT CT C AAAC GAAT GGG AG AT AAAT T T GAT AT CAC AGT 
T AC AAT T CAT a AAC T T G TAG AAGGT GT C AAAGAA t T GT AT T AT C AAG CT A 
AT GT AG C AAC AG AAC AAGT AAAT AAAGGT AAAT TTGCCCTT a AAC C AC AA 
G C CT T GC TAG AT AC T AAT T G G C AG AAAGT AAT T CT T CGT GAT AAAG AAAC 
AC AAGT T C GAT T T AC T AT T GAT G C T AGT CAAT T T AG T C AG AAAT T AAAAG 
AAC AG AT GG C AAAT G GT T AT T T C T TAG AAGGT T T T GT AC GT T T T AAAG AA 
G C C AAGG AT AGT AAT C AGG AGT T AAT G AGT AT T C C T T T T GT AGG AT T T AA 
T GG T GAT T T T G C G AAC T T AC AAG CAC T T G AAAC AC C GAT T TAT AAG AC G C 
T T T CT AAAG GT AGT T T C T AC TAT AAAC C AAAT GAT AC AAC T CAT AAAG AC 
CAATTGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTATACTGC 
CTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAATG 
G T GGGGAGT T AG AAT TAG CAC C GG AG AG T C C AAAAAG AAT TAT T T T AG G A 

ACTTTTGAGAATAAGGTTGAGGATAAAACAATTCATCTTTTGGAAAGAGA 
T G C AG C GAAT AAT C C AT AT TT T GC C AT T T C T C C AAAT AAAG AT G GAAAT A 
G GGAC GAAAT C ACT C C C C AG G C AAC T T T CT T AAG AAAT GT T AAG GAT AT T 
T C T G CT C AAGT T C TAG AT C AAAAT G GAAAT G T T AT T T G GC AAAGT AAGGT 
T T T AC CAT C T TAT C G T AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT GAT G 
G T CAT TAT C GT AT G GAT G C T C T T C AGT G G AGT GGT T TAG AT AAG GAT G G C 

AAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTACACACCAGT 
AG C AGAAGG AG C AAAT AG T C AG GAG T C AG ACT T T AAAGT AC AAGT AAGT A 
C T AAG T C AC C AAAT C T T C C T T CAC GAG C T C AGT T T GAT G AAAC T AAT C G A 
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AC AT T AAG CT T AG C CAT GC C T AAG G AAAGT AG T TAT GT T C C T AC AT AT C G 
TTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGGATG 
AGACTTCTTACCATTATTTCCATATAGATCAAGAAGGTAAAGTGACACTT 
CCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCTAAGGC 
CTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAaCGGTAAAAT 
TGTCTGATCTCTTGAATAAGGCAGTAGTATCAGAGAAAGAAAACGCTATA 
GT AAT T T CT a AC AGT T T C AAAT AT T T T G AT AAC T T G AAAAAAG AAC C TAT 
GTTT ATTT CTAAAAAAGAAAAAGTAGT AAACAAGAAT CTAGAAGAAATAA 
T AT T AGT T AAGC C G C AAAC T AC AGT T AC T AC T C AAT CAT T GT CT AAAG AA 
AT AAC T AAAT C AG G AAAT G AG AAAGT C C T C AC T T CT AC AAAC AAT AAT AG 
TAG C AG AGT AG C T AAGAT CAT AT C AC CT AAAC AT AAC G GGG AT T C T GT T A 
ACCATACC 

SEQ ID NO. 4412 
STRAIN 2603 

VDKHHSKKAILKLTLITTSILLMHSNQVNAEEQELKNQEQSPVIANVAQQPSPSVTTNTV 
EKTSVTAASASNTAKEMGDTSVBCNDKTEDELLEELSKNLDTSNLGADLEEEYPSKPETTN 
NKESNWTNASTAIAQKVPSAYEEVKPESKSSLAVLDTSKITKLQAITQRGKGNWAIID 
TGFDINHDIFRLDSPKDDKHSFKTKTEFEELKAKHNITYGKWVNDKIVFAHNYANNTETV 
ADIAAAMKDGYGSEAKNISHGTHVAGIFVGNSKRPAINGLLLEGAAPNAQVLLMRIPDKI 
DSDKFGEAYAKAITDAVNLGAKTINMS1GKTADSLIALNDKVKLALKLASEKGVAVWAA 
GNEGAFGMDYSKPLSTNPDYGTVNS PAI SE DTLS VAS YESLKT I SEWETTIEGKLVKLP 
I VT S K P FDKGKA Y D W Y AN YG AKK D FE GK D FKG K I AL IERGGGLD FMT KIT HATN AG WG 
IVIFNDQEKRGNFLIPYRELPVGIISKVDGERIKNTSSQLTFNQSFEWDSQGGNRMLEQ 
SSWGVTAEGAIKPDVTASGFEIYSSTYNNQYQTMSGTSMASPHVAGLMTMLQSHLAEKYK 
GMNLDSKKLLELSKNILMSSATALYSEEDKAFYSPRQQGAGWDAEKAIQAQYYITGNDG 
KAKINLKRMGDKFDITVTIHKLVEGVKELYYQANVATEQVNKGKFALKPQALLDTNWQKV 
ILRDKETQVRFTIDASQFSQKLKEQMANGYFLEGFVRFKEAKDSNQELMSIPFVGFNGDF 
ANLQALETPIYKTLSKGSFYYKPNDTTHKDQLEYNESAPFESNNYTALLTQSASWGYVDY 
VKNGGELELAPESPKRIILGTFENKVEDKTIHLLERDAANNPYFAISPNKDGNRDEITPQ 
ATFLRNVKDISAQVLDQNGNVIWQSKVLPSYRKNFHNNPKQSDGHYRMDALQWSGLDKDG 
KWADGFYTYRLRYTPVAEGANSQESDFKVQVSTKSPNLPSRAQFDETNRTLSLAMPKES 
SYVPTYRLQLVLSHWKDEEYGDETSYHYFHIDQEGKVTLPKTVKIGESEVAVDPKALTL 
WEDKAGNFATVKLS DLLNKAWSEKENAIVI SNS FKYFDNLKKE PMFI SKKEKWNKNL 
EEI I LVKPQTTVTTQSLSKE I TKSGNEKVLTSTNNNS SRVAKI I S PKHNGDSVNHTLPST 
S DRATNGL FVGTLALLS S LLL YLKPKKTKNNS K 

SEQ ID NO. 4413 
STRAIN A909 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTSASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
SSLAVLDTSKITKLQAITQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNI SHGTHVAGI FVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAVWAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKRL.R.G 
L . R . DCIN . AWWWT . FYD . NHSCYKCRCCWYRYF . RSRKTWKFSNSLP . ITCGGY . . SRW 
RAYKKYFKSVNI . PEF. SS. . PRWQSYAGTIKLGRDS . RSNQA. CNSFWL . NLFFNL . .S 
I PNNVWYKYGFTTCCRINDNASKSFG . EI . RDEFRF . KIARIV . KHPHELSNSII . . RG . 
GVLFTTSARCRCS . C . KS YPSS ILCYWKRWQS . N . SQTSGR . I . YHSYNS . TCRRCQRIV 
LSS . CSNRTSK . R . ICP . TTSLARY . LAESNSS . . RNTSSIYY . F . SI . SEIKRTDGKWL 
FLRRFCTF . RSQG . . SGVNEYSFCRI . W . FCELTST . NTDL . DAF . R . FLL . TK . YNS . R 
PIGVQ . ISSF . KQQLYCLVNT I SVLGLC . LCQKWWGVRISTGESKKNYFRNF. E .G.G.N 
NSSFGKRCSE . SIFCHFSK. RWK. G .NHSPGNFLKKC . GYFCSSSRSKWKCYLAK . GFTI 
LS . KFP. . SKAK. WSLSYGCPSVEWFR. GWQSCSRWFLYLSFTLHTSSRRSK. SGVRL . S 
SSKY . VTKSSFTSSV . . N . SNIKLSHA . GK . LCSYISSTISFISCCKR . RIWR . DFLPLF 
PYRSRR . SDTS . NS . DRRE . GCSRP . DLDTCCGR . SW . FRNGKIV . PLE . GSS IRERKRY 
SNF.QFQIF. . LEKRTYVYF. RRKSSKQESRRNSIS . AANYSYYSIIV . RNNSIRK. ESP 
HFYKQ . . . QSS . DHIT . T . RGFC . PY 

SEQ ID NO. 4414 
STRAIN H36B 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTSASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
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SSLAVLDTSKITKLQAITQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAVVVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGVVGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQSFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYSPRQQGAGWDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDSSQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMS I PFVGFNGDFANLQALETPI YKTLSKGS FYYKPNDTTHKD 
QLE YNE SAP FE SNN YTALLT QS AS WG YVD Y VKNGGE LE LAPE S PKRI I LGT FENKVE DKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDALQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKSPNLPSRAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHWKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKTLTLVVEDKAGNFATVKLSDLLNKAVVSEKENAI 
VISNNFKYFDNLKKEPMFISKEGKVVNKNLEEIALVKPQTTVTTQSLSKEITQSGNEKVL 
TSTNNNSSRVAKIISPKHNGDSVNHT 

SEQ ID NO. 4415 
STRAIN 18RS21 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTAASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKPESK 
SSLAVLDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKTKTEFEE 
LKAKHN I T YGKWVN DK I VFAHN Y ANNT E T VAD I AAAMKDG YG SE AKN I SHGT HVAG I FVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSIGK 
TADSLIALNDKVKLALKLASEKGVAVVVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEVVETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGIISKVDG 
ERIKNT S S QLT FNQS FE WDSQGGNRMLEQS S WGVTAEGAIKPDVTASGFE I YS ST YNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYSPRQQGAGWDAEKAIQAQYYITGNDGKAKINLKRMGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMS I PFVGFNGDFANLQALETPIYKTISKGS FYYKPNDTTHKD 
QLE YNE S APFE SNN YTALLTQS AS WG YVDYVKNGGELE LAPE S PKRI I LGT FENKVEDKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDALQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKSPNLPSRAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHWKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWEDKAGNFATVKLSDLLNKAWSEKENAI 
VISNSFKYFDNLKKEPMFISKKEKWNKNLEEIILVKPQTTVTTQSLSKEITKSGNEKVL 
TSTNNNSSRVAKIISPKHNGDSVNHT 

SEQ ID NO. 4416 
STRAIN M732 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDT SNLGADLEEE YP SKPETTNNKE SNVVTNASTAI AQKVPS AYEEVKSE SK 
SSLAVLDTSKITKLQATTQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNILHGTHVAGIFVG 
NSKRPAINSLLLEGAAPNAQVLLMRI PDKIDS DKFGEAYAKAI I DAVNLGAKT INMSLGK 
TAD S L I ALN DKVKLALKL AS EKGVAV WAAGN E GAFGM D Y S K P L S TN P D YGT VN S P AI S E 
DT L S VAS YE S LKT I S EWE T T I EGKLVKL P I VT S K P FDKGKAYD VVYAN YGAKK I LKVRT 
LKVRLH. LSVWDLIL. LKSLMLQMQVLLVSLFLTIKKNVEIF . FLTVNYLWGLLVK . MA 
SV . KILQVS . HLTRVLK. LIAKVAIVCWNNQVGA. QLKEQSSLM. QLLALKFILQPIIIN 
TKQCLVQVWLHHMLQD . . QCFKVIWLRNIKG . I . ILKNC . NCLKTSS . AQQQHYIVKRIR 
RFIHHVSKVQV. LMLKKLSKLNIMLLETMAKLKLISNEREINLISQLQFINL . KVSKNCI 
IKLM . QQNK . IKVNLPLNHKPC . ILIGRK . FFVIKKHKFDLLLMLVNLVRN . KNRWQMVI 
S . KVLYVLKKPRIVIRS . . VFLL . DLMVILRTYKHLKHRFIRRFLKWSTINQMIQLIKT 
NWSTMNQLLLKATTILPC . HNQRLGAMLIMSKMVGS . N . HRRVQKELF . ELLRIRLRIKQ 
FIFWKEMQRIIHILPFLQIKMEIGTKSLPRQLS . EMLRIFLLKF . IKMEMLFGKVRFYHL 
IVKISIIIQSKVMVIIVWMLFSGVV. IRMAKL. QMVFILIAYVTHQ. QKEQIVRSQTLKF 
K . VLSHQI FLHELS LMKLIEH . A . PCLRKVVMFLHIVYN . FYLML . KMKNMGMRLLTIIS 
I . IKKVK . HFLKRLR . ERVRLR . TLRP . HLLWKIKLVILQR . NCLTS . IRQ . YQRKKTL . 
. FLTVSNILIT . RKNLCLFLKKEK . . TRI . KK . H . LSLKLQLLLNHCLKK . LNQEMRKSS 
LLQTIIVAE. LRSYHLNITGILLTI 
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SEQ ID NO. 4417 
STRAIN COH1 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKSESK 
SSLAVLDTSKITKLQATTQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEEEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNILHGTHVAGIFVG 
NSKRPAINSLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAIIDAVNLGAKTINMSLGK 
TAD S L I ALN DKVKLALKLAS EKGVAVWAAGNE GAFGMD Y S KPL S TN P D YGT VN S PAI S E 

DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKILKVRT 
LKVRLH . LSWVDLIL . LKS LMLQMQVLLVS L FLT IKKNVE I F . FLTVNYLWGLLVK . MA 
S V . KI LQVS . HLTRVLK . L I AKVAI VCWNNQVGA . QLKEQSS LM . QLLALKFI LQPIIIN 
TKQCLVQVWLHHMLQD . . QC FKVI WLRN I KG . I . ILKNC . NCLKTSS . AQQQHYIVKRIR 
RFIHHVSKVQV. LMLKKLSKLNIMLLETMAKLKLISNEREINLISQLQFINL . KVSKNCI 
IKLM . QQNK . IKVNLPLNHKPC . ILIGRK . FFVIKKHKFDLLLMLVNLVRN . KNRWQMVI 
S . KVLYVLKKPRIVIRS . . VFLL . DLMVILRTYKHLKHRFIRRFLKWSTINQMIQLIKT 
NWSTMNQLLLKATT ILPC . HNQRLGAMLIMSKMVGS . N . HRRVQKELF . ELLRIRLRIKQ 
FIFWKEMQRIIHILPFLQIKMEIGTKSLPRQLS . EMLRIFLLKF . IKMEMLFGKVRFYHL 
IVKISIIIQSKVMVIIVWMLFSGVV. IRMAKL . QMVFILIAYVTHQ . QKEQIVRSQTLKF 
K. VLSHQIFLHELSLMKLIEH. A. PCLRKWMFLHIVYN . FYLML . KMKNMGMRLLTIIS 
I . IKKVK . HFLKRLR . ERVRLR . TLRP . HLLWKIKLVILQR . NCLTS . IRQ . YQRKKTL 
. FLTVSNILIT . RKNLCLFLKKEK . . TRI . KK . H . LSLKLQLLLNHCLKK . LNQEMRKSS 
LLQTIIVAE . LRSYHLNITGILLTI 

SEQ ID NO. 4418 
STRAIN M781 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKSESK 
SSLAVLDTSKITKLQATTQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNILHGTHVAGIFVG 
NSKRPAINSLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAIIDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAWVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKILKVRT 
LKVRLH. LSWVDLIL . LKSLMLQMQVLLVS LFLT IKKNVE I F . FLTVNYLWGLLVK . MA 
SV. KILQVS . HLTRVLK. LI AKVAI VCWNNQVGA. QLKEQSSLM . QLLALKFILQPIIIN 
TKQCLVQVWLHHMLQD . . QCFKVIWLRNIKG . I . ILKNC . NCLKTSS . AQQQHYIVKRIR 

RFIHHVSKVQV. LMLKKLSKLNIMLLETMAKLKLISNEREINLISQLQFINL. KVSKNCI 
IKLM . QQNK . IKVNLPLNHKPC . ILIGRK . FFVIKKHKFDLLLMLVNLVRN . KNRWQMVI 
S . KVLYVLKKPRIVIRS . . VFLL . DLMVILRTYKHLKHRFIRRFLKWSTINQMIQLIKT 
NWSTMNQLLLKATTILPC . HNQRLGAMLIMSKMVGS . N . HRRVQKELF . ELLRIRLRIKQ 
FIFWKEMQRIIHILPFLQIKMEIGTKSLPRQLS . EMLRIFLLKF . IKMEMLFGKVRFYHL 

IVKISIIIQSKVMVIIVWMLFSGVV. IRMAKL. QMVFILIAYVTHQ. QKEQIVRSQTLKF 
K. VLSHQIFLHELSLMKLIEH. A. PCLRKWMFLHIVYN . FYLML . KMKNMGMRLLTIIS 
I . IKKVK . HFLKRLR . ERVRLR . TLRP . HLLWKIKLVILQR . NCLTS . IRQ . YQRKKTL 
. FLTVSNILIT . RKNLCLFLKKEK . . TRI . KK . H . LSLKLQLLLNHCLKK . LNQEMRKSS 
LLQTIIVAE. LRSYHLNITGILLTI 

SEQ ID NO. 4419 
STRAIN JM9130013 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTAASASNTAKEMGDTSVECNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKPESK 
SSLAVLDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKTKTEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRI PDKI DS DKFGEAYAKAIT DAVNLGAKTINMS IGK 

TADSLIALNDKVKLALKLASEKGVAVVVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGVVGIVIFNDQEKRGNFLIPYRELPVGIISKVDG 
ERIKNTSSQLTFNQSFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYSPRQQGAGVVDAEKAIQAQYYITGNDGKAKINLKRMGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLEYNESAPFESNNYTALLTQSASWGYVDYVKNGGELELAPESPKRIILGTFENKVEDKT 
I HLLERDAANN P YFAI S PNKDGNRDE I T PQAT FLRN VKD I S AQVLDQNGN VI WQ SKVLPS 
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YRKNFHNNPKQSDGHYRMDALQWSGLDKDGKVVADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKS PNLPSRAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHVVKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWEDKAGNFATVKLSDLLNKAVVSEKENAI 
VI SNSFKYFDNLKKEPMFI SKKEKVVNKNLEEI ILVKPQTTVTTQSLSKE ITKSGNEKVL 
T S TNNNS SRVAKI I S PKHNGD S VNHT 

SEQ ID NO. 4420 
STRAIN 090 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
SSLAVFDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGBCWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
T AD S L I ALN DKVKLALKL AS E KG VAVWAAGNE G A FGM D Y S K P L S T N P D YG T VN S PA I S E 

DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGVVGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQSFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYS PRQQGAGVVDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVT IHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE S APFE SNN YT ALLTQS AS WGYVDYVKNGGE LELAPE S PKRI I LGT FENKVE DKT 

IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDAFQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKS PNLPLLAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHWKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWE DKAGNFATVKLSDLLNKAWSEKENAI 

VISNSFKYFDNLKKESMFISKEGKVWKNLEEITLVKPQTTVTTQSLSKEITKSGNEKVL 
TSTNNNS SRVAKI I S PKHNGDSVNHT 

SEQ ID NO. 4421 
STRAIN CJB110 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
S SLAVFDTSKITKLQAITQRGKGNVVAI I DTGFDINHDI FRLDS PKDDKHS FKTKAEFEE 

LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAVWAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQSFEWDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQNHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYS PRQQGAGWDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVT IHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE SAP FE SNN YT ALLTQ S AS WGYVDYVKNGGELE LAPE S PKRI I LGTFENKVE DKT 

IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 

YRKNFHNNPKQSDGHYRMDAFQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFPCV 

QVSTKS PNLPLLAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHVVKDEEYGDETSYHYF 

HIDQEGKVTLPKTVKIGESEVAVDPKALTLWEDBCAGNFATVKLSDLLNKAWSEKENAI 

VI SNSFKYFDNLKKESMFISKEGKVVNKNLEEITLVKPQTTVTTQSLSKE ITKSGNEKVL 
TSTNNNS SRVAKI I S PKHNGDSVNHT 

SEQ ID NO. 4422 
STRAIN 1169NT 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTAKEMGDTSVKNDKTEDE 
LLEELSPCNLDTSNMGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPKSK 
SSLAVLDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKNKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
N SKRPAINGLLLEGAAPNAQVLLMRI PDKI DS DKFGEAYAKAITDAVNLGAKT INMS IGK 
T AD S L I ALN D K VK L ALKL AS E KG V AVW AAGN E G A FGM DYSKPLSTNP D YGT VN S PA I S E 
DTLSVASYESLKTISEVVETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQRFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGT SMAS PHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMS S ATALYSEE DK 
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AFYSPRQQGAGWDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFASLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE SAP FE SNN YT ALLTQ S AS WG YV D YVKNGGE LE LAPE S PKRI I LGT FENKVE DKT 

IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 

YRBCNFHNNPKQSDGHYRMDALQWSGLDKDGPCWADGFYTYRLRYTPVAEGANSQESDFKV 

QVSTKSPNLPSRAQFDETNRTLSLAMPKGSSYVPIYRLQLVLSHWKDEEYGDETSYYYF 

HIDQEGKATLPKTVKIGESEVAVDPKALTLWEDKAGNFATVKLSDLLNKAWSEKENAI 

VISNSFKYFDNLKKEPMFISKKEKWNPCNLEEIILVKPHTTVTTQSLSKEITKSGNEKVL 
TSTNNNS SRVAKI I S PKHNGDS VNHT 

SEQ ID NO. 4501 
STRAIN 2603 

ATGAAAAAGATTAGAAAAAGTTTAGGACTTCTACTATGTTGCTTTTTAGGATTGGTACAA 
TTAGCGTTTTTTTCGGTAGCCAGTGTAAATGCTGATACCCCTAATCAACTAACAATCACA 
GAG AT AG G AC T T C AG C C AAAT AC T AC AG AG GAGG GGAT T T CT T AT C GT T T AT GGAC T GT G 
AC T GAC AAC T T AAAAGT T GAT T TAT T GAG C C AAAT GAC AGAT AG C G AAT T G AAC C AG AAG 
T AT AAGAGT AT C T T G AC T T C T C C T AC T G AT AC T AAT GGT C AG AC AAAGAT AG C AC T C C C A 
AATGGTTCGTACTTTGGTCGTGCTTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCT 
TTTTATATTGAATTACCAGATGATAAGTTATCAAATCAATTACAGATAAATCCTAAGCGA 
AAAGT T GAAAC AG G C C GAT T AAAAC T TAT T AAAT AT AC AAAAGAAGG AAAGAT AAAGAAA 

AGGCTATCCGGAGTAATATTTGTATTATACGATAACCAGAATCAGCCAGTTCGCTTTAAA 
AAT G GAC GAT T T AC GAC C GAT C AAG AT G GGAT TACT T C AT T AGT AAC T GAT GAT AAGG GA 
GAAAT T G AGGT T GAAG G T T TAT T AC C T G G T AAGT AT AT T TT T C GAG AAG C AAAAGCAC T A 

ACTGGTTACCGTATATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAG 
GAAGTAGAGG TAG AAAAC G AAAAAG AAAC T C C T CC AC C AAC AAAT C C T AAAC CAT C AC AA 

CCGCTTTTTCCACAATCATTTCTTCCTAAAACAGGAATGATTATTGGTGGAGGACTGACA 

ATTCTTGGTTGTATTATTTTGGGAATTTTGTTTATCTTTTTAAGAAAAACTAAAAATAGC 
AAAT CT GAAAGAAACGATACAGTA 

SEQ ID NO. 4502 
STRAIN 090 

GATACCCCTAATCAACTAACAATCACAC 

AG AT AG GAC T T C AG C C AAAT AC TAG AG AG GAG GG GAT T T C TT AT C G T T T A 
T GG ACT GT GAC T GAC AAC T T AAAAGT T GAT T T AT T GAG C C AAAT GAC AGA 
TAG C GAAT T GAAC C AG AAGT AT AAGAG TAT C T T GAC T T C T C C TACT GAT A 
CTAATGG t C AG AC AAAGAT AG C ACT C C C AAAT GGT T CGT ACT TTGGT CGT 

GCTTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGA 
AT T AC C AGAT GAT AAG T TAT C AAAT C AAT T AC AG AT AAAT C C T AAG C G AA 

AAGTTGAAACAGGCCGATTAAAACTTATTAAATATACAAAAGAAGGAAAG 
ATAAAGAAAAGGCTATCAGGAGTAATATTTGTATTATACGATAACCAGAA 
TCAGCCAGTTCGCTTTAAAAATGGACGATTTACGACCGATCAAGATGGGA 
T T AC T T CAT T AGT AAC T GAT G AT AAGGG AG AAAT T G AGG T T G AAGG T T T A 
T T AC C T G GT AAG T AT AT T T T T C G AGAAG C AAAAG C AC T AAC TGGtTACCG 
TAT AT C TAT G AAGGAT G C T G T AGT T G CT GT AG T T G CT AAT AAAAC AC AG G 

AAGTaGAGGTaGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAA 
CCATCACAACCG 

SEQ ID NO. 4503 
STRAIN H36B 

GATACCCCTAATCAACTAACAATCACACAGA 

T AGG AC T T C AG C C AAAT ACT AC AG AGG AGG G GAT T T C T TAT CGT TT AT GG 
AC T GT G AC T GAC AAC T T AAAAGT T GAT T T AT T G AG C C AAAT GAC AG AT AG 
C GAAT T GAAC C AG AAG TAT AAG AG TAT C T T GAC T T C T C C T AC T GAT AC T A 
AT GG t C AG AC AAAGAT AG C ACT C C C AAAT GG T T C GT AC TTTGGTCGTGCT 
TATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAATT 
AC C AG ATG AT AAGT T AT C AAAT C AAT TAG AGAT AAAT CCT AAG CG AAAAG 
T T GAAAC AG G C CG AT T AAAAC T T AT T AAAT AT AC AAAAG AAG G AAAGAT A 
AAG AAAAG G C T wT C C G G AGT AAT AT T T GT AT TAT AC GAT AAC C AG AAT C A 
G C C AGT T C G C T T T AAAAAT GGAC GAT T TAG GAC C GAT C AAG AT GG G AT T A 
C T T CAT T AGT AAC T GAT GAT AAGGG AG AAAT T GAGGT T GAAG G T T TAT T A 
CCT G GT AAGT AT AT T T T T C GAG AAG C AAAAG C ACT AACT GGT T AC C GT AT 
AT CT AT GAAG GAT G C T G T AGT T G C T G T AG T T G C T AAT AAAAC AC AGG AAG 
TAGAGGTAGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAACCA 
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TCACAACCGC 

SEQ ID NO. 4504 
STRAIN 18RS21 

GATACCCCT AAT CAACT AAC AAT C AC AC AG 

AT AGGACT T C AGCC AAAT ACT AC AG AG GAGGGG AT T T CT T AT C GT T TAT G 
GACTGTGACTGACAACTTAAAAGTTGATTTATTGAGCCAAATGACAGATA 
GCGAATTGAACCAGAAGTATAAGAGTATCTTGACTTCTCCTACTGATACT 

AAT GG t C AG AC AAAG AT AG C ACT C C CAAAT GGT T C GT ACT TT GGT C GT G C 
TTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAAT 

T AC C AGAT GAT AAGT T AT CAAAT C AAT T AC AG AT AAAT CCT AAG C GAAAA 
GTTGAAACAGGCCGATTAAAACTTATTAAATATACAAAAGAAGGAAAGAT 

AAAGAAAAG G CT AT C C G G AGT AAT AT T T GT AT T AT ACGAT AAC C AG AAT C 
AG C C AGT T C GCT T T AAAAAT GGAC GAT T T ACG AC C GAT C AAG AT GG GAT T 
ACTTCATTAGTAACTGATGATAAGGGAGAAATTGAGGTTGAAGGTTTATT 

AC CT G GT AAGT AT AT T T T T C GAG AAG C AAAAG C AC T AACT G G T T AC CG T A 
TATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAGGAA 
GTAGAGGTAGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAACC 

ATCACAACC 

SEQ ID NO. 4505 
STRAIN CJB110 

GATACCCCTAATCAACTAACAATCACACA 

GATAGGACTTCAGCCAAATACTACAGAGGAGGGGATTTCTTATCGTTTAT 
GGaCTGTGACTGACAACTTAAAAGTTGATTTATTGAGCCAAATGACAGAT 

AG CGAAT T g AAC C AGAAGT AT AAGAGT AT C T T G ACT T C T C c t ACT GAT Ac 
TAATGGTCAGACAAAGATAGCACTCCCAAATGGTTcGTACTTTGGTCGTG 
CTTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAA 
TTACCAGATGATAAGTTATCAAATCAATTACAGatAAATCCTAAGCGAAA 

AGT T G AAAC AG G C CGAT T a a AACT TAT T AAAT AT AC AAAAG AAG G AAAG A 
TAAAGAAAAGGCT aT C AGGAGT AAT AT T T GT ATT AT ACGAT AACCAGAAT 
CAGCCAGTTCGCTTTAAAAATGGACGATTTACGACCGATCAAGATGGGAT 
TACTTCATTAGTAACTGATGATAAGGGAGAAATTGAGGTTGAAGGTTTAT 
TACCTGGTAAGTATATTTTTCGAGAAGCAAAAGCACTAACTGGTTaCCGT 
ATATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAGGA 
AGTAGAGGTAGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAAC 
CATCACAACC 

SEQ ID NO. 4506 
STRAIN 1169NT 

GAT AC C C CT AAT C AACT AAC AAT C AC AC AG 

ATAGGACTTCAGCCAAATACTACAGAGGAGGGGATTTCTTATCGTTTATG 
G ACT GT GACT G AC AAC T T AAAAGT T GAT T T AT T GAG C CAAAT G AC AG AT A 
GCG AATTGAACC AGAAGT AT AAG AGT AT CTTGACTTCT.CCTACTGAT ACT 
AAT GG t C Ag a C AAAG AT AG C ACT C C CAAAT GGT T C GT AC T T T GGT C G T G C 
TTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAAT 
TACCAGATGATAAGTTATCAAATCAATTACAGATAAATCCTAAGCGAAAA 
GT T GAAACAGGCCGAT T AAAACTT AT T AAAT AT AC AAAAG AAG GAAAG AT 
AAAGAAAAGGCTATCAGGAGTAATATTTGTATTATACGATAACCAGAATC 
AG C C AGT T C G CTT T AAAAAT GG ACG AT T T AC G AC C GAT C AAG AT GG GAT T 
AC T T CAT T AGT AAC t ga T GAT AAG G GAG AAAT T G AGGT T G AAG GT T T AT T 
AC C T GGT AAG TAT AT T T T T C G AG AAG C AAAAGC AC T AACT GGT T AC CGT A 
TATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAGGAA 
GT AG AG GT AG AAAAC G AAAAAGAAAC T C CT C C AC C AAC AAAT C CT AAAC C 
ATCACAACC 

SEQ ID NO. 4507 
STRAIN 2603 

MKKIRKSLGLLLCCFLGLVQLAFFSVASVNADTPNQLTITQIGLQPNTTEEGISYRLWTV 
TDNLKVDLLSQMTDSELNQKYKSILTSPTDTNGQTKIALPNGSYFGRAYKADQSVSTIVP 
FYIELPDDKLSNQLQINPKRKVETGRLKLIKYTKEGKIKKRLSGVIFVLYDNQNQPVRFK 
NGRFTTDQDGITSLVTDDKGEIEVEGLLPGKYIFREAKALTGYRISMKDAVVAVVANKTQ 
EVEVENEKETPPPTNPKPSQPLFPQSFLPKTGMIIGGGLTILGCIILGILFIFLRKTKNS 

KSERNDTV 
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SEQ ID NO. 4508 
STRAIN 090 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YIFREAKALTGYRISMKDAWAWANKTQEVEVENEKETPPPTNPKPSQP 

SEQ ID NO. 4509 
STRAIN H36B 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YIFREABCALTGYRISMKDAWAWANKTQEVEVENEKETPPPTNPKPSQP 

SEQ ID NO. 4510 
STRAIN 18RS21 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YIFREAKALTGYRISMKDAWAWANKTQEVEVENEKETPPPTNPKPSQ 

SEQ ID NO. 4511 
STRAIN 1169NT 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YIFREAPCALTGYRISMKDAVVAWANKTQEVEVENEKETPPPTNPKPSQ 

SEQ ID NO. 4601 
STRAIN A909 

T GAC AAAT AT TAT T T TAG C C AACGT GGT T TAG AG C AAG C AGG T GT AACT AT AT T ACC TT T 
C T C AC C G AAT AAT AT C AGT G AG GAT T TAG AG AT TAT T G C AG G AAAT GCTTTTCGTC C AG A 
T AAC AAT G AAGAGT T GG CT T AT GT TAT T GAAAAGGG C T AT CAT T T T AAACGAT AT CAT G A 

ATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTGTAGCTGGGGCACATGGAAA 
AACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAATATTACAGACACTTCTTTCCT 
AAT T GG AG AT GGT AC AGGAC GT GGT T C T G C T AAT G C T AAT TAG TTTGTGTTT G AAGCT GA 
T GAAT AC GAAC GT CAT T T T AT G C C GT AC CAT C C AG AAT AC T C AAT TAT T AC C AAT AT T G A 

TTTTGACCATCCTGATTATTTTACAGGCCTAGAGGACGTATTCAATGCCTTTAATGACTA 
T GC T AAG CAAGT T C AAAAAGGT T T AT T CAT T T AT GG AG AAG AT C C AAAAC T T CAT G AAAT 
C AC T T C T GAG G C AC C AAT AT AT TAT TAT G GT T T T G AAG AT T C AAAT GAT T T TAT AG C AAA 
AG AC AT C ACT C GAAC T GT T AAT GGT T C T GAC TT T AAGGT T T T C TAT AAC C AAGAAG AAAT 
TGGTCAGTTTCATGTACCAGCATACGGTAAACATAATATCTTAAATGCAACTGCTGTTAT 
TGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCTGAGCATTTGAAGACATT 
TTCAGGGGTAAAGCGTCGTTTTACTGAGAAGATTATTGACGATACTGTCATTATTGATGA 
CTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGATGCTGCTCGACAAAAATACCC 
GTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTCACTCGTACGATAGCTCTTTT 
AGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTATCTCGCTCAAATATATGG 

TTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAAGATTTAGCTGCTAAGATTGT 
C AAAC AC T C AG AT T TAG T GAC AGT C G AAAAT GT CTCGCCTT T AC T C AAT CAT GAT AAT G C 
T GT CT AT GT CT T TAT G GG T GC T GG AG AC AT T C AAT T G TAT GAG CGCTCTTTT GAAG AAT T 
AT TAG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4602 
STRAIN 1169NT 

AAAAG C AGG CT C T AGT G AC G T T GAC AAAT AT TAT T T T ACC C AAC GT GGT T TAG AG C AAG C 
AGGT G T AAC TAT AT T AC C T T T C T C AC C GAAT AAT AT C AGT G AGGAT T TAG AG AT TAT T G C 
AG G AAAT GCTTTTCGT C C AG AT AAC AAT GAAG AGT T GG C T TAT GT T AT T G AAAAGG G C T A 
TCATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGG 
TGTAGCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAA 
TATTACAGACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAA 
T T AC TTTGTGTTT GAAG C T GAT GAAT AC G AACGT CAT T T T AT G C C G T AC CAT C C AG AAT A 
CT C AAT TAT T AC C AAT AT T G AT T T T GAC CAT C CT GAT TAT T T T AC AG G C C TAG AG G AC GT 
AT T C AAT G C C T T T AAT GAC TAT G CT AAG C AAGT T C AAAAAGGT T TAT T CAT T TAT GG AGA 



176 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



AG AT C C AAAAC T T CAT G AAAT C ACT T C T G AGG C AC C AAT AT AT T AT T AT GGT T T T GAAG A 
T T C AAAT GAT T T T AT AGC AAAAGAC AT C AC T C GAACT GT T AAT GGT T CT GACT T T AAG G T 
TTTCTATAACCAAGAAGAAATTGGTCAGTTTCATGTACCAGCATACGGTAAACATAATAT 
CTTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGT 
AGC T GAG CAT T T GAAG AC AT T T T C AG GGGT AAAG C G T C GT T T T ACT GAGAAGAT TAT T G A 
C GAT ACT GT CAT TAT T GAT G ACT T T GCT C AC CAT C CT AC T GAGAT T AT T G C G AC AT TAG A 
TGCTGCTC G AC AAAAAT AC C CGT C AAAAG AAAT T GT AG CT AT T T T C C AAC C GC AT ACGT T 
CACTCGTACGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGT 
T TAT C T C G C T C AAAT AT AT GGTTCTGC T AGAGAAG T AGAT AAT GGT GAG GT G AAGGT AG A 
AG AT T T AG C T G C T AAG AT T GT C AAAC AC T C AG AT T T AGT G AC AG T C G AAAAT GT C T CG C C 

TTTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTA 
T GAG CGCTCTTTT G AAGAAT TAT TAG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4 603 
STRAIN 090 

AAAGCAGGCTCTAGTGACGTTGACAAATATTATTTTACCCAACGTGGTTTAGAGCAAGCA 
GGTGTAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGATTATTGCA 
G G AAAT GCTTTTCGTC C AG AT AAC AAT GAAG AGT T GG C T TAT GT TAT T GAAAAG GG C T AT 
CATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGT 
GTAGCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAAT 
AT T AC AG AC AC TTCTTTCC T AAT T G GAGAT G GT AC AG GAC GT GGT T C T G CT AAT GC T AAT 

TACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATAC 
T C AAT TAT TAG C AAT AT T GAT T T T GAC CAT C CT GAT TAT T T T AC AGG C C T AG AGGAC G T A 
T T C AAT GCT T T T AAT GAC TAT G C T AAG C AAGT T C AAAAAGGT T T AT T CAT T TAT GG AG AA 
GAT T C AAAACT T CAT G AAAT C AC T T C T AAGG C AC C AAT AT AT TAT T ATGG TT T T GAAG AT 
TCAAATGATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGTT 
T T C T AT AACC AAG AAG AAAT T G GT C AG T T T CAT GT AC C AG C AT ACGGT AAAC AT AAT AT C 
T T AAAT G C AAC T GC T GT T AT T G CT AAC C T T T AC AT AAT GG GAAT T GAT AT GG CAT T AGT A 
GCT GAG CAT T T GAAG AC AT T T T CAGGG GT AAAAC GT C G T T T T AC T GAGAAGAT TAT T GAC 
GATACTGTCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGAT 
GCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTC 
ACTCGTACGATAGCTCTTTTAGACGATTTTGCCCATGCTTTGAGTCAAGCGGATAGCGTT 
TAT C T T G CT C AAAT AT AT GG T T C T G CT AG AG AAGT AG AT AAT GGT GAG G T G AAGG T AGAA 
GAT T T AG CT G C T AAG AT T GT C AAAC AC T C AG AT T T AGT GAC AGT CG AAAAT GTCTCGCCT 
TTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTAT 
GAG CGCTCTTTT GAAG AAT TAT TAG CT AAC C TAACT AAAAAT AC AC AA 

SEQ ID NO. 4604 
STRAIN H36B 

AAAAGCAGGCTCTAGTgACGTTgACAAATATtATTTTACTCAACGTGGTTtAGAGCAAGCAGGT 
AT AACT AT AT T AC C T T T C T C AC CG AAT AAT AT C AGT G AGG AT T TAG AG AT TAT T G C AGG A 

AATGCTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAAGGGCTATCAT 
TTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTGTA 
GCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAATATT 
AC AG AC AC TTCTTTCC T AAT T G GAGAT G GT AC AGGAC GTGGTTCTG C T AAT G C T AAT T AC 
TTTGTGTTT GAAG C T GAT GAAT AC G AAC GT CAT T T TAT G CC G T AC CAT C C AG AAT ACT C A 
AT TAT T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T TAG AG G C CT AG AG G AC GT AT T C 
AAT G CT T TT AAT GAC TAT G C T AAG C AAGT T C AAAAAGGT T TAT T CAT T TAT G GAGAAGAT 
C C AAAAC T T CAT GAAAT C ACT T CT G AG GC ACC AAT AT AT TAT TAT GGT T T T GAAG AT T C A 
AAT GAT T T TAT AGC AAAAG AT AT C AC T C G AAC T GT T AAT GG T T C T GACT T T AAG GT T T T C 
TAT AAC C AAG AAGAAAT T GGT C AG T T T C AC GT AC C AG CAT AC G G T AAAC AT AAT AT CT T A 
AATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCT 
GAG CAT T T GAAG AC AT T T T CAGGG GT AAAAC GT C G T T T T AC T GAG AAAAT TAT T G ACG AT 
AC T G T CAT TAT T GAT GAC TTTGCTCAC CAT C C T AC T GAGAT TAT T G C GAC AT TAG AT GCT 
GCT CGAC AAAAAT AC C CGT C AAAAG AAAT TGTAGCT AT TTTCC AAC CGC AT ACGT TC ACT 
CGTACGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTAT 
C T C G CT C AAAT AT AT GGT T C T G CT AG AG AAG TAG AT AAT GGT GAG G T G AAG GT AG AAG AT 
T T AG CT G C T AAG AT T GT C AAAC AC T C AG AT T TAG T GAC AGT CG AAAAT GTCTCGCCTTTA 
CTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTATGAG 
CGCTCTTTT GAAG AAT TAT TAG C T AAC C TAACT AAAAAT AC AC AA 

SEQ ID NO. 4605 
STRAIN 18RS21 

AAAGCAGGCTCTAGTGACGTTGACAAATATTATTTTACCCAACGTGGTTTAGAGCAAGCA 
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GGT GT AAC T AT AT T AC C T T T C T C AC C GAAT AAT AT C AGT GAG GAT T TAG AG AT TAT T GC A 
G GAAAT GCTTTTCGTC CAG AT AAC AAT G AAG AGT T GG CT T AT GT T AT T G AAAAGG G CT AT 
CAT T T T AAACG AT AT CAT GAAT T T C T C G GAGAT T T TAT G CGT C AGT T C AC T AGT C T AG GT 
GT AG C T GGGG C AC AT G G AAAAAC C T C AACG AC AGGT T T AT T AGC T C AT GT T T T AAAAAAT 
AT T AC AG AC ACT T CT T T C C T AAT T G GAGAT GGT AC AG GACGT GGT T C T G CT AAT G CT AAT 
TACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATAC 
TCAATTATTACCAATATTGATTTTGACCATCCTGATTATTTTACAGGCTTAGAGGACGTA 
TTCAATGCCTTTAATGACTATGCTAAGCAAGTTCAAAAAGGTTTATTCATTTATGGAGAA 
GAT C C AAAAC T T CAT GAAAT C AC T T C T G AG GC AC C AAT AT AT TAT TAT G GT T T T GAAG AT 
T C AAAT GAT T T TAT AG C AAAAG AC AT C AC T C G AAC T GT T AAT GGT T CT G ACT T TAAGGT T 
T T C TAT AAC C AAGAAG AAAT T GG T CAG T T T C AT GT AC C AGC AT ACGGT AAAC AT AAT AT C 
T T AAAT G C AAC T GC T GT TAT T G C T AAC C T T T AC AT AAT G GG AAT T GAT AT GG C AT T AGT A 

GCTGAGCATTTGAAGACGTTTTCAGGGGTAAAGCGTCGTTTTACTGAGAAGATTATTGAC 
GAT AC T G T CAT TAT T GAT G ACT T T G C T C AC CAT C C TACT GAGAT TAT T G C G AC AT TAG AT 

GCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTC 
AC T C GT ACG AT AGC T CT T T T AG ACG AAT T T G C C C AT G C C T T G AGT C AAG CGG AT AG C GT T 

TATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAA 
GAT T T AG CT G CT AAG AT T GT C AAAC ACT CAG AT T T AGT G AC AGT CG AAAAT GTCTCGCCT 
T T ACT C AAT CAT GAT AAT G CT GT C T AT GT C T T TAT GGGTGCTG GAG AC AT T C AAT T GT AT 
GAG CGCTCTTTT GAAG AAT TAT TAG C T AAC C T AACT AAAAAT AC AC AA 

SEQ ID NO. 4606 
STRAIN M732 

AAAAGCAGGCTCTAGTGACGTtGACAAATAtTATTTTACCCAACGTGGTTTAGAGCAAGCAG 
GT G T AACT AT AT T AC CT T T C T C AC CGAAT AAT AT C AGT G AGGAT T TAG AG AT T AT T G CAG 
GAAAT GCTTTTCGTC CAG AT AAC AAT GAAG AGT T G G C T TAT GT TAT T G AAAAG G G C TAT C 

ATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTG 
TAG C T G GGG C AC AT G GAAAAAC C T C AACG AC AG GT T TAT T AG C T C AT GT T T T AAAAAAT A 

TTACAGACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAATT 
ACT T T GT GT T T GAAG C T GAT GAAT AC G AACG T CAT T T TAT G C C GT AC CAT C CAG AAT AC T 
C AAT TAT T AC C AAT AT T GAT T T T G AC CAT C CT G AT TAT T T T AC AGG C C TAG AG GAC GT AT 
T C AAT GC CT T T AAT GAC T AT G C T AAG C AAG T T C AAAAAGGT T TAT T CAT T TAT GGAG AAG 
AT C C AAAAC T T CAT GAAAT C AC T T C T G AGGC ACC AAT AT AT TAT TAT GGT T T T GAAG AT T 

CAAATGATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGTTT 
T C T AT AAC C AAG AAGAAAT T G G T C AGT T T CAT GT AC CAG CAT AC G GT AAAC AT AAT AT C T 
T AAAT G C AAC T G CT GT TAT T G C T AAC CT T T AC AT AAT GGG AAT T GAT AT G GC AT TAG TAG 
C T GAG CAT T T GAAGAC AT T T T C AG GGGT AAAG CGT C GT T T T AC T GAG AAG AT TAT T GAC G 
AT AC T GT CAT TAT T GAT G ACT TT G C T C AC CAT C C T ACT GAGAT TAT T GC G AC AT TAG AT G 

CTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTCA 
CTCGTACGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTT 
ATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAgGTAGAAG 
AT T T AGC T G CT AAg AT T GT C AAAC AC T CAG AT T T AGT GAC AGT C G AAAAT GT CTCGCCTT 
TACT C AAT CAT GAT AAT G C T GT C TAT GT CT T T AT GGGT G C T GGAG AC AT T C AAT T GT AT G 
AG CGCTCTTTT G AAGAAT T AT T AG C T AAC C T AACT AAAAAT AC AC AA 

SEQ ID NO. 4607 
STRAIN M781 

AAAG CAG G C T CT AG T G AC GT t GAC AAAT AT T AT T T T AC C C AAC GT GGT T TAG AG C AAG CAG 

GTGTAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGATTATTGCAG 
GAAAT GCTTTTCGTC CAG AT AAC AAT GAAG AGT T GG C T T AT GT T AT T G AAAAG GG C TAT C 

ATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGT 
GT AG C T G G G G C AC AT G GAAAAAC C T C AAC GAC AGGT T TAT TAG CT CAT GT T T T AAAAAA 
TAT T AC AG AC AC T T CT T T C C T AAT T GGAG AT G GT AC AG GACGT G G T T CT G C T AAT G C T AA 

TTACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATA 
C T C AAT TAT T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T T AC AGG C C T AGAG G AC G T 

ATTC AATGCCTTT AATG ACT AT GCT AAGC AAGT TC AAAAAGGT TT AT TC ATT TAT GGAG A 
AG AT C C AAAAC T T CAT GAAAT C AC T T C T G AG G C ACC AAT AT AT TAT TAT GGT T T T GAAG A 

TTCAAATGATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGT 
T T T C TAT AAC C AAG AAG AAAT T G GT C AGT T T CAT GT AC CAG CAT AC GGT AAAC AT AAT AT 

CTTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGT 
AG CT G AG CAT T T GAAG AC AT TT T C AG GG GT AAAG CGTCGTTT T AC T GAG AAG AT TAT T GA 
C GAT AC T G T CAT TAT T GAT GAC T T T G C T C AC CAT C C T AC T GAGAT TAT T G C GAC AT TAG A 

TGCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTT 
C AC T CGT AC GAT AGC T C T T T TAG AC GAAT T T G C C CAT G C C T T GAGT C AAG C GG AT AG CGT 
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TTATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGA 
AGAT T TAG C T G CT AAG AT T GT CAAAC ACT C AGAT T T AGT GAC AGT C G AAAAT GT C T C GC C 

TTTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTA 
T G AG CG C T CT T T T GAAG AAT T AT T AGC T AAC C T AAC T AAAAAT AC ACAA 

SEQ ID NO. 4608 
STRAIN CJB110 

AAAAAGCAGGCTCTAGTGACGTtGACAAATAtTATTTTACCCAACGTGGTTTAGAGCAAGCA 
GGTGTAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGATTATTGCA 
G G AAAT GCTTTTCGT C C AGAT AAC AAT GAAG AGT T GG C T TAT GT TAT T GAAAAG GGC T AT 
CAT T T T AAAC G AT AT CAT G AAT T T C T CGG AGAT T T TAT G C G T C AGT T C AC TAG T C T AG G T 
GT AG CT GGG GC AC AT G G AAAAAC CT C AAC G AC AGGT T TAT TAG C T C AT GT T T T AAAAAAT 
AT T AC AG AC AC TTCTTTCC T AAT T G G AGAT GG T AC AGGACGT GG T T C T GC T AAT G C T AAT 

TACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATAC 
T C AAT TAT T AC C AAT AT T GAT T T T GAC CAT C C TG AT TAT T T TAG AG G C C TAG AGGACGT A 
T T C AAT G C T T T T AAT G ACT AT G C T AAG C AAGT T C AAAAAGGT T TAT T CAT T T AT GG AGAA 
GAT T C AAAAC T T C AT G AAAT C AC T T C T AAG G C AC C AAT AT AT TAT TAT G G T T T T GAAG AT 
T C AAAT GAT T T T AT AGC AAAAG AC AT C ACT CG AAC T GT T AAT G G TT CT GAC T T T AAGGT T 
T T CT AT AAC C AAG AAG AAAT T GGT C AGT T T CAT G T AC C AG C AT ACGGT AAAC AT AAT AT C 

TTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTA 
G C T GAG CAT T T G AAGAC AT T T T C AGG G G T AAAAC G T C GT T T TACT GAG AAG AT TAT T GAC 

GATACTGTCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGAT 
GCTGCTCGAC AAAAAT ACCCGTCAAAAGAAATTGTAGCT AT TTTCC AAC CGC AT ACGTTC 
ACT CGT AC GAT AG C T C T T T TAG AC GAT T T T GC C CAT G CT T T GAGT C AAG CGGAT AG CGT T 

TATCTTGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAA 
GAT T TAG C T G C T AAG AT T G T CAAAC ACT C AGAT T T AGT GAC AG T C G AAAAT GTCTCGCCT 

TTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTAT 
GAG CGCTCTTTT GAAG AAT TAT TAG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4609 

STRAIN JM9130013 (reverse complement) 

GT T C AAAAAAG C AGG C T C T AGT GACG T T GAC AAAT AT TAT T T T AC T C AAC GT G GT T TAG A 
GCAAGCAGGTATAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGAT 
TATTGCAGGAAATGCTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAA 
GGG CT AT CAT T T T AAAC GAT AT CAT G AAT T T C T C G GAG AT T T TAT G C GT C AGT T C AC TAG 
T C T AGGT GT AG C T GGG G C AC AT G G AAAAAC C T C AAC G AC AG GT T T AT T AG CT C AT GT T T T 
AAAAAAT AT T AC AG AC AC TTCTTTCC T AAT T G G AGAT G GT AC AG G AC GT GGTTCTGC T AA 

TGCTAATTACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCC 
AGAAT ACT C AAT TAT T AC C AAT AT T GAT T T T GAC CAT C CT G AT TAT T T T AC AG G C CT AG A 
GG ACGT AT T C AAT G C T T T T AAT GAC T AT GCT AAG C AAG T T C AAAAAGGT T TAT T CAT T T A 
T G GAGAAGAT C C AAAAC T T CAT G AAAT C ACT T C T GAG G C AC C AAT AT AT TAT TAT G GT T T 

TGAAGATTCAAATGATTTTATAGCAAAAGATATCACTCGAACTGTTAATGGTTCTGACTT 
T AAGGT T T T C TAT AAC C AAG AAG AAAT T GG T C AGT T T C AC G T AC C AGC AT AC G GT AAAC A 

TAATATCTTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGC 
ATT AGT AGCT GAGC ATT TG AAGAC AT TTTCAGGGGT AAAAC GTCGTTTT ACT GAG AAAAT 
TATTGACGATACTGTCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGAC 
AT TAG AT G C T G CT C GAC AAAAAT AC C C GT C AAAAG AAAT T G TAG CT AT T T T C C AAC CG C A 
T AC GT T C AC T C GT ACG AT AGC T C T T T T AGAC G AAT T T G C C CAT GC C T T GAGT C AAGCG G A 
TAGCGTTTATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAA 
GGTAGAAGATTTAGCTGCTAAGATTGTCAAACACTCAGATTTAGTGACAGTCGAAAATGT 
CTCGCCTTTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCA 
AT T GT AT G AG CG C T CT T T T GAAG AAT TAT T AG CT AAC CT AAC T AAAAAT AC AC AA 

SEQ ID NO. 4610 

STRAIN COH1 reverse complement 

C AGGC T C T AGT GAC GT GAC AAAT AT t AT T T T AC C C AAC GT G GT T AGAG C AAG C AGGT GT AA 
C T AT AT T AC C T T T C T C AC C G AAT AAT AT C AGT G AGG AT T T AGAGAT TAT T G C AG G AAAT G 
CTTTTCGTC C AG AT AAC AAT GAAG AG T T G G C T T AT GT TAT T G AAAAG GG CT AT CAT T T T A 
AAC GAT AT CAT G AAT T T CT C GG AG AT T T TAT G C GT C AGT T C AC T AGT C T AG G T G T AG CT G 
GGGCAC AT GGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTT AAAAAAT ATT AC AG 
ACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAATTACTTTG 
T G T T T GAAG C T GAT G AAT AC G AAC GT CAT T T TAT G C C G T AC C AT C C AG AAT ACT C AAT T A 
T T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T T AC AGG C CT AG AG GAC G TAT T C AAT G 
C C T T T AAT GAC TAT G C T AAG C AAGT T C AAAAAG GT T TAT T CAT T TAT GG AG AAG AT C C AA 
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AAC T T CAT GAAAT C AC T T C T G AGG C AC C AAT AT AT TAT TAT GGT T T T G AAG AT T C AAAT G 

ATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGTTTTCTATA 
AC C AAGAAGAAAT T G GT C AGT T T C AT GT AC C AG C AT AC GGT AAAC AT AAT AT CT T AAATG 

CAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCTGAGC 
ATTTGAAGACATTTTCAGGGGTAAAGCGTCGTTTTACTGAGAAGATTATTGACGATACTG 
T CAT TAT T GAT G ACT T T G C T C AC CAT CCTAC T G AGAT TAT T G C G AC AT T AG AT GCT G C T C 
G AC AAAAAT AC C CGT C AAAAG AAAT T GT AG CT AT T T T C C AAC CG C AT ACGT T C AC T C G T A 
CGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTATCTCG 
CTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAAGATTTAG 
CTGCTAAGATTGTCAAACACTCAGATTTAGTGACAGTCGAAAATGTCTCGCCTTTACTCA 
AT CAT GAT AAT G CT GT C T AT GT CT T T AT G GGT GCT GG AG AC AT T C AAT T GT AT GAG C G C T 
C T T T T GAAG AAT TAT T AGCTAACC TAACT AAAAAT AC AC AA 

SEQ ID NO. 4611 
STRAIN 2603 

atgtcaaaaacttatcattttattggtattaaaggatccggaatgagtgccctagcactg 

atgcttcatcaaatgggacataacgtccaaggaagtgacgttgacaaatattattttacc 

caacgtggtttagagcaagcaggtgtaactatattacctttctcaccgaataatatcagt 

gaggatttagagattattgcaggaaatgcttttcgtccagataacaatgaagagttggct 

tatgttattgaaaagggctatcaatttaaacgatatcatgaatttctcggagattttatg 

cgtcagttcactagtctaggtgtagctggggcacatggaaaaacctcaacgacaggttta 

ttagctcatgttttaaaaaatattacagacacttctttcctaattggagatggtacagga 

cgtggttctgctaatgctaattactttgtgtttgaagctgatgaatacgaacgtcatttt 

atgccgtaccatccagaatactcaattattaccaatattgattttgaccatcctgattat 

tttacaggcttagaggacgtattcaatgcctttaatgactatgctaagcaagttcaaaaa 

ggtttattcatttatggagaagatccaaaacttcatgaaatcacttctgaggcaccaata 

tattattatggttttgaagattcaaatgattttatagcaaaagacatcactcgaactgtt 

aatggttctgactttaaggttttctataaccaagaagaaattggtcagtttcatgtacca 

gcatacggtaaacataatatcttaaatgcaactgctgttattgctaacctttacataatg 

ggaattgatatggcattagtagctgagcatttgaagacgttttcaggggtaaagcgtcgt 

tttactgagaagattattgacgatactgtcattattgatgactttgctcaccatcctact 

gagattattgcgacattagatgctgctcgacaaaaatacccgtcaaaagaaattgtagct 

attttccaaccgcatacgttcactcgtacgatagctcttttagacgaatttgcccatgcc 

ttgagtcaagcggatagcgtttatctcgctcaaatatatggttctgctagagaagtagat 

aatggtgaggtgaaggtagaagatttagctgctaagattgtcaaacactcagatttagtg 

acagtcgaaaatgtctcgcctttactcaatcatgataatgctgtctatgtctttatgggt 

gctggagacattcaattgtatgagcgctcttttgaagaattattagctaacctaactaaa 
aatacacaa 

SEQ ID NO. 4612 

STRAIN COH1 reverse complement 

C AGG C T CT AGT GAC G T t GAC AAAT At TAT T T T AC C C AAC GT GG t T TAG AG C AAG C AG G T GT AA 
CT AT AT T AC C T T T C T C AC C G AAT AAT AT C AGT GAG GAT T TAG AG AT TAT T GC AG GAAAT G 
CTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAAGGGCTATCATTTTA 
AACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTGTAGCTG 
G GG C AC AT GG AAAAAC C T C AAC GAC AGG T T TAT T AG CT CAT GT T T T AAAAAAT AT T AC AG 

ACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAATTACTTTG 
TGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATACTCAATTA 
T T AC C AAT AT T GAT T T T GAC CAT CC T GAT TAT T T T AC AGG C CT AGAGG ACGT AT T C AAT G 
C C T T T AAT GAC TAT G C T AAG C AAG T T C AAAAAGGT T TAT T CAT T T AT GG AG AAG AT C C AA 
AACTTCATGAAATCACTTCTGAGGCACCAATATATTATTATGGTTTTGAAGATTCAAATG 
ATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGTTTTCTATA 
AC C AAGAAGAAAT T G G T C AG T T T CAT GT AC C AG CAT AC G GT AAAC AT AAT AT CT T AAAT G 

CAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCTGAGC 
AT T T G AAGAC AT T T T C AG G G GT AAAG CGT CGT TT TAG T GAG AAG AT TAT T GAC GAT AC T G 

TCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGATGCTGCTC 
GACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTCACTCGTA 
CGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTATCTCG 
C T C AAAT AT AT G GT T C T G C TAG AG AAG TAG AT AAT G G T G AGG T G AAGGT AG AAG AT T T AG 
C T G CT AAG AT T GT C AAAC AC T C AG AT T T AGT GAC AGT C G AAAAT GT CTCGCCTTTACT C A 
ATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTATGAGCGCT 
CTTTTGAAGAATT ATT AGCTAACCTAACT AAAAAT AC AC AA 

SEQ ID NO. 4613 
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STRAIN A909 frame: 2 

DK Y Y FT QRGLE QAG VT I L P F S PNN I S E D LE 1 1 AGN AFR P DNNE E L A Y V I E KG YH FKR YHE 
FLG D FMRQ FT S LG VAG AHGKT S T T GL L AH VLKN I T DT S FL I G D GT GRG S AN AN Y FV FE AD 
E YERHFMPYHPE YS 1 1 TNI DFDHP DYFTGLED VFNAFNDYAKQVQKGL FI YGE D PKLHE I 
TSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNILNATAVI 
ANLYIMG I DMALVAEHLKT FSGVKRRFTEKI I DDT VI I D DFAHHPTE 1 1 ATLDAARQKYP 
SKEIVAI FQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVEDLAAKIV 
KHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERSFEELLANLTKNTQ 

SEQ ID NO. 4614 
STRAIN 1169NT frame: 2 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTE I IATLD 

AARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERSFEELLANLTKNTQ 

SEQ ID NO. 4615 
STRAIN 090 FRAME : 1 

KAG S S D V DK Y Y FT QRGLE Q AG VT ILPFSPNNISEDLEII AGNAFR P DNNE E L A Y V I E KG Y 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DSKLHEITSKAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNAT AVI ANLYIMG I DMALVAEHLKT FSGVKRRFTEKI I DDT VI I DDFAHHPTE I IATLD 
AARQKYPSKEIVAIFQPHTFTRT IALLDDFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKI VKHS DLVTVENVS PLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4616 
STRAIN H36B frame: 2 

KAGSSDVDKYYFTQRGLEQAGITILPFS PNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
H FKRYHE FLGD FMRQ FT S LGVAG AHGKT S TTGLL AHVLKN IT DT S FL IGDGT GRG SAN AN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKT FSGVKRRFTEKI IDDT VI I DDFAHHPTE I IATLD 
AARQKYPSKEIVAIFQPHTFTRT IALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKI VKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERSFEELLANLTKNTQ 

SEQ ID NO. 4617 
STRAIN 18RS21 frame: 1 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNAT AVI ANLYIMG I DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTE I IATLD 
AARQKYPSKEIVAI FQPHT FTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVS PLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4618 
STRAIN M732 frame: 2 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGIDMALVAEHLKTFSGVKRRFTEKIIDDTVIIDDFAHHPTEIIATLD 
AARQKYPSKEIVAI FQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKI VKHS DLVTVENVS PLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4619 

STRAIN JM9130013 frame: 2 

FKKAG S S D V DKYYFTQRGLE QAG ITILPFSPNNISEDLEII AGN A FR P DNNE E L A Y V I E K 
GYHFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSAN 
ANYFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIY 
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GEDPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKH 
NILNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKI I DDT VI IDDFAHHPTE 1 1 AT 

LDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVK 
VE DLAAKI VKH S DL VT VEN VS PLLNH DNAV YVFMGAGD I QLYERS FEE LLANLTKNTQ 

SEQ ID NO. 4620 
STRAIN M781 frame: 1 

KAG S S D V DK Y Y FT QRG LE Q AG VT I L P F S PNN I S E D LE 1 1 AGN AFR P DNNE E LAY VI EKG Y 

HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMG I DMALVAEHLKTFSGVKRRFTEKI I DDTVI I DDFAHH PTE I IATLD 
AARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDI QLYERS FEELLANLTKNTQ 

SEQ ID NO. 4621 
STRAIN CJB110 frame: 3 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DSKLHEITSKAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LN AT AV IAN L Y I MG I DMAL VAE H LKT F S G VKRR FT EKIIDDTVIIDD FAHH PT E 1 1 AT L D 

AARQKYPSKEIVAIFQPHTFTRTIALLDDFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4622 
STRAIN 2603 frame: 1 

MSKTYHFIGIKGSGMSALALMLHQMGHNVQGSDVDKYYFTQRGLEQAGVTILPFSPNNIS 
E DL E 1 1 AGN A FR P DNN E E L A Y V I E KG YQ FKR YHE FL G D FMRQ FT S L G VAGAHGKT S T TG L 

LAHVLKNITDTSFLIGDGTGRGSANANYFVFEADEYERHFMPYHPEYSIITNIDFDHPDY 
FTGLEDVFNAFNDYAKQVQKGLFIYGEDPKLHEITSEAPIYYYGFEDSNDFIAKDITRTV 
N G S D FKV F YNQE E I G Q F H V P A YGKHN I LN AT AV IAN L Y IMG I DMAL VAE H LKT F S G VKRR 

FTEKI I DDTVI IDDFAHHPTE 1 1 ATLDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHA 

L S QAD S V YLAQ I YG S ARE VDNGEVKVE DLAAKI VKH SDLVT VEN VSPLLNHDNAVYVFMG 
AGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4623 
STRAIN COH1 frame: 3 

GSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGYHF 
KR YH E FLG D FMRQ FT S LG VAG AH GKT S T T G L L AHV LKN I T D T S FL IGDGTGRG SAN AN Y F 

VFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGEDP 
KLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNILN 
ATAVIANLYIMGIDMALVAEHLKTFSGVKRRFTEKII DDTVI I DDFAHH PTE II ATLDAA 
RQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVEDL 
AAK I VKH S D L VT VE N V S P L LNH DN AV Y V FMGAG D I Q L YE R S FE E L LAN LT KN T Q 

SEQ ID NO. 4701 
STRAIN A909 

TAT T T T T T AAC AAC AAAAAAAG G AAAAG AG C T AAGG AAAAAT G C AGAAAA 
ATT CT ATGGAGAAT ATAAAGAAAAT C CAGAAGAATAT CAT C AAAT AGCT A 
AAGAT AAAG C AAG T G AAT AT T C AAAT T TAG C T GT T GAT AC T T T T AAAG AT 
TAT AAAGGT AAAT T T G AAT C AGGT GAATT GAC AACAGAGGAT AT CGT CT C 

AGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCTAATGATTTTG 
T C AAT C AAG C T AAAT C AAAAT T C T C AG AC GAG GAT AC T G C T AAAAAAG AA 
GAT AAGG C T C C T G AAAC AAAAG T AG AAG AT AT T G T CAT T GAT TAT AAAG A 
AAAC ACAGAAGAT AAAGAAAAA 

SEQ ID NO. 4702 
STRAIN H36B 

TAT T T T T T AAC AAC AAAAAAAG G AAAAG AG C T AAGG AAAAAT G C AG AAAA 
AT T C T AT GG AG AAT AT AAAGAAAAT C CAGAAGAATAT CAT C AAAT AGC T A 
AAG AT AAAG C AAG T G AAT AT T C AAAT T TAG CT GT T GAT AC T T T T AAAG AT 
TAT AAAGGT AAAT T T GAAT C AG G T G AAT T GAC AAC AG AG GAT AT C G T C T C 
AG C CGT T AAGG AAAAAAGC G G AG AAGT AGT T GAC T T T G C T AAT GAT T T T G 
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T C AAT CAAG CT AAAT C AAAAT T CT C AG ACGAG GAT ACT G C T AAAAAAGAA 
GAT AAGGC T C CT G AAAC AAAAG T AG AAGAT AT T GT C AT T G AT T AT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4703 
STRAIN 18RS21 

TAT T T T T T AAC AAC AAAAAAAGGAAAAG AGCT AAG G AAAAAT G C AG AAAA 
AT T CT AT GGAGAAT AT AAAG AAAAT C C AGAAG AAT AT CAT C AAAT AG C T A 
AAGAT AAAG C AAGT G AAT AT T C AAAT TT AGCT GT T GAT AC T T T T AAAGAT 
TATAAAGGTAAATTTGAATCAGGTGAATTGACAACAGAGGATATCGTCTC 
AGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCTAATGATTTTG 
T C AAT C AAGC T AAAT C AAAAT T C T C AG AC GAGGAT AC T G C T AAAAAAG AA 
G AT AAGG CT C C T G AAAC AAAAGT AGAAGAT AT T GT CAT T GAT TAT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4704 
STRAIN M732 

TAT T T T T T AAC AAC AAAAAAAGGAAAAG AG C T AAG G AAAAAT G C AG AAAA 
AT T C TAT G GAGAAT AT AAAG AAAAT C C AGAAGAAT AT CAT C AAAT AG CT A 
AAG AT AAAG C AAGT G AAT AT T C AAAT T TAG C T G T T GAT AC T T T T AAAG AT 
TATAAAGGT AAATTTG AATCAGGT GAAT TGACAACAGAGGATAT CGT CT C 
AGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCTAATGATTTTG 
TCAATCAAGCT AAAT C AAAAT TCTCAGACGAGGATACTGCTAAAAAAGAA 
GAT AAGG CT C CT G AAAC AAAAGT AG AAG AT AT T GT C AT T GAT TAT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4705 
STRAIN COH1 

TAT T T T T T AAC AAC AAAAAAAG G AAAAG AG CT AAGG AAAAAT G C AGAAAA 
AT TCTATGGAGAATAT AAAGAAAAT C C AGAAGAAT AT CAT CAAAT AGCT A 
AAGAT AAAGC AAGTGAAT AT T CAAAT TTAG CT GT TGATACTT T T AAAGAT 
TAT AAAG G T AAAT T T GAAT C AG GT G AAT T G AC AAC AG AGG AT AT C GT C T C 
AG C C GT T AAG GAAAAAAG CG GAG AAGT AG T T G ACT T T GCT AAT GAT T T T G 
T C AAT C AAGC T AAAT C AAAAT T C T C AGAC GAG GAT AC T G C T AAAAAAG AA 
GAT AAGG C T C C T GAAAC AAAAGT AGAAGAT AT T G T CAT T GAT TAT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4706 
STRAIN M781 

TAT T T T T T AAC AAC AAAAAAAGGAAAAG AG C 

T AAG GAAAAAT G C AG AAAAAT T C TAT G GAG AAT AT AAAG AAAAT C C AG AA 
GAAT AT CAT CAAAT AG C T AAAG AT AAAG C AAGT GAAT AT T CAAAT T TAG C 
T GT T GAT AC T T T T AAAG AT TAT AAAGGT AAAT T T GAAT C AG GT G AAT T G A 
CAACAGAGGATATCGTCTCAGCCGTTAAGGAAAAAAGCGGAGAAGTAGTT 
GACTTTGCTAATGATTTTGTCAATCAAGCTAAATCAAAATTCTCAGACGA 
GGAT ACTGCT AAAAAAGAAGATAAGGCT C CT GAAAC AAAAGT AG AAGAT A 
T T G T CAT T GAT T AT AAAG AAAAC AC AG AAG AT AAAG AAAAA 

SEQ ID NO. 4707 
STRAIN 2603 

tattttttaacaacaaaaaaaggaaaagagctaaggaaaaatgcagaaaa 
attctatggagaatataaagaaaatccagaagaatatcatcaaatagcta 
aagataaagcaagtgaatattcaaatttagctgttgatacttttaaagat 
tataaaggtaaatttgaatcaggtgaattgacaacagaggatatcgtctc 
agccgttaaggaaaaaagcggagaagtagttgactttgctaatgattttg 
tcaatcaagctaaatcaaaattctcagacgaggatactgctaaaaaagaa 
gataaggctcctgaaacaaaagtagaagatattgtcattgattataaaga 
aaacacagaagataaagaaaaa 

SEQ ID NO. 4708 
STRAIN 090 

TATTTTTTaACaACAAAAAAAGGAAAAGAGCTAAGGAAAAATGCAGAAAA 
AT T CT AT GGAGAAT AT AAAGAAAAT CC AG AAG AAT AT CAT CAAAT AGCT A 
AAG AT AAAG C AAGT GAAT AT T CAAAT T T AG C T G T T GAT AC T T T T AAAG AT 
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TAT AAAGGT AAAT T T GAAT C AG GT GAAT T GAC AAC AG AG GAT AT C GT C T C 
AGCCGT T AAGGAAAAAAG C GG AGAAGT AGT T G AC T T T GC T AAT GAT T T T G 
TC AAT CAAGCTAAAT CAAAATT CT CAGACGAGGATACTGCT AAAAAAGAa 
GAT AAGG C T C C T G AAAC AAAa GT AGAAGAT AT T GT C AT T GAT T AT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4709 
STRAIN CJB110 

TAT T TT T T AAC AAC AAAAAAAG G AAAAG AG C T AAGG AAAA 
AT GC AGAAAAAT T C T AT G G AG AAT AT AAAGAAAAT C C AGAAG AAT AT CAT 
C AAAT AG C T AAAG AT AAAG C AAGT GAAT AT T C AAAT T T AG C T GT T GAT AC 
TT TT AAAGATT AT AAAGGT AAAT T T GAATCAGGT gAAT TGACAACAGAGG 
AT AT C GT CT C AG C C G t T AAGG AAAAAAG C G GAG AAGT AGT T GAC T T T G C T 
AAT GAT T T T GT C AAT CAAGCTAAAT CAAAATT C T C AG ACGAGG AT AC T G C 
T AAAAAAGAAGAT AAG GC T C C T G AAAC AAAAG TAG AAG AT AT T G T C AT TG 
AT TAT AAAG AAAAC AC AGAAGAT AAAG AAAAA 

SEQ ID NO. 4710 
STRAIN 1169NT 

TATTTTTTAACAACAAAAAAAGGAAAAGAGCTAAGGAAA 
AAT GCAGAAAAATT CTATGGAGAAT AT AAAGAAAAT CCAGAAG AAT AT CA 
T C AAAT AGCTAAAGAT AAAG C AAGT GAAT AT T C AAAT T T AGC T G T T GAT A 
CT T T T AAAG AT TAT AAAG GT AAAT T T GAAT C AGGT GAAT T GAC AAC AG AG 
GATATCGTCTCAGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGC 
TAATGATTTT GT CAAT CAAGCTAAAT CAAAATT CT CAGATGAGGATACTG 
C T AAAAAAG AAAAT AAGG CT C C T G AAAC AAAAG T AGAAG AT AT T GT CAT T 
GAT T AT AAAGAAAAC AC AG AAG AT AAAG AAAAA 

SEQ ID NO. 4711 
STRAIN JM9130013 

T ATT TTT T Aa CAAC AAAAAAAGGAAAAGAGCT AAGGAAAA 
ATGCAGAAAAATT CTATGGAGAAT AT AAAGAAAAT C C AG AAG AAT AT CAT 
CAAAT AGCTAAAGAT AAAGCAAGTGAAT ATT CAAATT TAGCTGT T GAT AC 
TT TT AAAGATT AT AAAGGT AAATTTGAAT C AGGT GAAT TGACAACAGAGG 
ATATCGTCTCAGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCT 
AAT GAT T T T G T CAAT C AAGC T AAAT C AAAAT TC T C AG AC GAG GAT AC T G C 
T AAAAAAGAAGAT AAGG CT C CT GAAAC AAAAGT AGAAGAT ATT GT C ATT G 
AT TAT AAAG AAAAC AC AGAAG AT AAAGAAAAA 
SEQ ID NO. 4712 
STRAIN 2603 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 
TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4713 
STRAIN A909 frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDT FKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4714 
STRAIN H36B frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 
TTEDIVSAVKEKSGEVVDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4715 
STRAIN 18RS21 frame: 1 

YFLTTKKGKELRKNAEKFYGEYKEN PEE YHQIAKDKASEYSNLAVDT FKDYKGKFESGEL 
TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKE DKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4716 
STRAIN M732 frame: 1 
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YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQ I AKDKAS E Y SNLAVDT FKDYKGKFES GEL 

TTEDIVSAVKEKSGEVVDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4717 
STRAIN _COHl frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4718 
STRAIN _M781 frame: 1 

YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQ I AKDKAS EYSNLAVDT FKDYKGKFES GEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4719 
STRAIN _090 frame: 1 

YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQ I AKDKAS EYSNLAVDT FKDYKGKFES GEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4720 

STRAIN _CJB110 frame: 1 

YFLTTKKGKE LRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4721 
STRAIN 1169NT frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKENKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4722 

STRAIN _JM9130013 frame: 1 

YFLTTKKGKE LRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO: 4801 
STRAIN 2603 

aatagtactgagacaagtgcttcagtagttcctactacaaatactatcgt 
tcaaactaatgacagtaatcctaccgcaaaatttgtatcagaatcaggac 
aatctgtaataggtcaagtaaaaccagataattctgcggcgcttacaaca 
gttgacacgcctcatcatatttcagctccagatgctttaaaaacaactca 
atcaagtcctgtcgttgagagtacttctactaagttaactgaagagactt 
acaaacaaaaagatggtcaagatttagccaacatggtgagaagtggtcaa 
gttactagtgaggaactcgttaatatggcatacgatattattgctaaaga 
aaacccatctttaaatgcagtcattactactagacgccaagaagctattg 
aagaggctagaaaacttaaagataccaatcagccgtttttaggtgttccc 
ttgttagtcaaggggttagggcacagtattaaaggtggtgaaaccaataa 
tggcttgatctatgcagatggaaaaattagcacatttgacagtagctatg 
tcaaaaaatataaagatttaggatttattattttaggacaaacgaacttt 
ccagagtatgggtggcgtaatataacagattctaaattatacggtctaac 
gcataatccttgggatcttgctcataatgctggtggctcttctggtggaa 
gtgcagcagccattgctagcggaatgacgccaattgctagcggtagtgat 
gctggtggttctatccgtattccatcttcttggacgggcttggtaggttt 
aaaaccaacaagaggattggtgagtaatgaaaagccagattcgtatagta 
cagcagttcattttccattaactaagtcatctagagacgcagaaacatta 
ttaacttatctaaagaaaagcgatcaaacgctagtatcagttaatgattt 
aaaatctttaccaattgcttatactttgaaatcaccaatgggaacagaag 
ttagtcaagatgctaaaaacgctattatggacaacgtcacattcttaaga 
aaacaaggattcaaagtaacagagatagacttaccaattgatggtagagc 
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attaatgcgtgattattcaaccttggctattggcatgggaggagcttttt 

caacaattgaaaaagacttaaaaaaacatggttttactaaagaagacgtt 

gatcctattacttgggcagttcatgttatttatcaaaattcagataaggc 

tgaacttaagaaatctattatggaagcccaaaaacatatggatgattatc 

gtaaggcaatggagaagcttcacaagcaatttcctattttcttatcgcca 

acgaccgcaagtttagcccctctaaatacagatccatatgtaacagagga 

agataaaagagcgatttataatatggaaaacttgagccaagaagaaagaa 

ttgctctctttaatcgccagtgggagcctatgttgcgtagaacacctttt 

acacaaattgctaatatgacaggactcccagctatcagtatcccgactta 

cttatctgagtctggtttacccatagggacgatgttaatggcaggtgcaa 

actatgatatggtattaattaaatttgcaactttctttgaaaaacatcat 

ggttttaatgttaaatggcaaagaataatagataaagaagtgaaaccatc 

tactggcctaatacagcctactaactccctctttaaagctcattcatcat 

tagtaaatttagaagaaaattcacaagttactcaagtatctatctctaaa 

aaatggatgaaatcgtctgttaaaaataaaccatccgtaatggcatatca 
aaaagca 

SEQ ID NO: 4802 
STRAIN 090 

AAT AG T ACT G AG AC AAGT G CT T C AGT AGT T C C T AC T AC AA 

AT ACT AT C G T T C AAAC T AAT G AC AGT AAT C C T AC C G C AAAAT T T GT AT C A 

GAAT C AG G AC AAT C T G T AAT AG G T C AAGT AAAAC C AG AT AAT TCTGCGGC 

GCTTACAACAGTTGACACGCCTCATCATATTTCAGCTCCAGATGCTTTAA 
AAAC AAC T C AAT C AAGT CCTGTCGTT GAG AGT AC T T C T AC T AAGT T AACT 
G AAG AGAC T T AC AAAC AAAAAGAT GGT AAAG AT T T AG C C AAC AT G G T GAG 
AAGT G GT C AAGT T AC T AGT GAG GAAC T C G T T AAT AT G G C AT ACG AT AT T A 
T T G C TAAAG AAAAC C C AT CT T T AAAT G C AGT CAT T AC T AC TAG AC G C C AA 
GAAG C TAT T GAAG AG G C T AGAAAAC T T AAAG AT AC C AAT C AG CCGTTTTT 

AGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCACAGTATTAAAGGTGGTG 
AAAC C AAT AAT G GC T T GAT C TAT G C AG AT GG AAAAAT T AG C AC AT T T G AC 
AGT AG C T AT GT C AAAAAAT AT AAAG AT T T AGGAT T T AT TAT T T T AGGAC A 
AACGAACTTTCCAGAGTATGGGTGGCGTAATATAACAGATTCTAAATTAT 
ACGGTCTAACGCATAATCCTTGGGATCTTGCTCATAATGCTGGTGGCTCT 
TCTGGTGGAAGTGCAGCAGCCATTGCTAGCGGAATGACGCCAATTGCTAG 
CGGTAGTGATGCTGGTGGTTCTATCCGTATTCCATCTTCTTGGACGGGCT 
T G GT AG G T T T AAAAC C AAC AAG AG GAT T G GT G AGT AAT G AAAAG C C AG AT 
T CG TAT AGT AC AG C AGT T CAT T T T C C ATT AAC T AAGT CAT C TAG AG AC G C 
AG AAAC AT TAT T AAC T TAT C T AAAGAAAAG CG AT C AAAC G CT AG TAT C AG 
T T AAT GAT T T AAAAT CT T t AC C AAT T G CT TAT AC T T T GAAAT C AC C AAT G 
G GAAC AG AAGT T AGT C AAG AT GC T AAAAACG CT AT TAT G G AC AACGT C AC 
AT T C T T AAG AAAAC AAG GAT T C AAAGT AAC AG AG AT AG ACT T AC C AAT T G 
AT GGT AG AG CAT T AAT G C G T GAT TAT T C AAC C T T GG CT AT T GGC AT GGG A 
GG AG C T T T T T C AAC AAT T GAAAAAGACT T AAAAAAAC AT G G T T T T AC T AA 
AG AAG ACGT T GAT C C T AT T AC T T G GG C AG T T CAT GT T AT T TAT C AAAAT T 
C AG AT AAGG C T G AACT T AAG AAAT C T AT T AT GG AAGC C C AAAAAC AT AT G 
GAT GAT TAT CG T AAG G C AAT GG AG AAG C T T C AC AAG C AAT T T C CT AT T T T 
C T T AT C G C C AAC G AC C G C AAGT TT AGC C CC T C T AAAT AC AG AT C CAT AT G 
T AAC AG AG GAAG AT AAAAG AGC GAT T T AT AAT AT GG AAAACT T GAG C C AA 
GAAG AAAG AATTGCTCTCTTTAATCGCCAGTGGG AGC CTATGTTGCGT AG 
AAC AC C T T T T AC AC AAAT T GC T AAT AT G AC AG GAC T C C C AG C T AT C AG T A 

TCCCGACTTACTTATCTGAGTCTGGTTTACCCATAGGGACGATGTTAATG 
G C AG GT G C AAAC TAT GAT AT G G T AT T AAT T AAAT T T G C AAC T T T C T T T G A 
AAAAC AT CAT GGT T T T AAT G T T AAAT GG C AAAG AAT AAT AG AT AAAGAAG 
TGAAACCATCTACTGGCCTAATACAGCCTACTAACTCCCTCTTTAAAGCT 
CAT T CAT CAT TAG T AAAT T T AG AAGAAAAT T C AC AAGT TACT C AAGT AT C 

TATCTCTAAAAAATGGATGAAATCGTCTGTTAAAAATAAACCATCCGTAA 
T GG C AT AT CAAAAAGC A 

SEQ ID NO: 4803 
STRAIN A909 

T AC T AC AAAT AC TAT C GT T C AAACT AAT GAC AGT AAT C C T AC C G C AAAAT 
T T GT AT C AG AAT C AG GAC AAT C T GT AAT AG G T C AAG T AAAAC C AG AT AAT 
TCTGCGGCGCT T AC AAC AGT T G AC AC GC CT CAT CAT AT T T C AG C T C C AG A 
TGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGAGTACTTCTACTA 
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AGT T AAC T GAAG AG ACT T AC AAAC AAAAAG AT GGT C AAGAT T TAG C C AAC 

ATGGTGAGAAGTGGTCAAGTTACTAGTGAGGAACTCGTTAATATGGCATA 
C GAT AT TAT T G C T AAAG AAAAC C CAT CT T T AAAT GC AGT CAT T AC TACT A 
G AC GC C AAG AAG C TAT T GAAGAGGC T AG AAAAC T T AAAG AT AC CAAT C AG 
CCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCACAGTATTAA 
AGGTGGTGAAACCAATAATGGCTTGATCTATGCAGATGGAAAAATTAGCA 
CAT T T G AC AG TAG C TAT GT C AAAAAAT AT AAAG AT T TAGGAT T T AT TAT T 
T T AGG AC AAACG AACT T T C C AG AGT AT GGG T GG CG T AAT AT AAC AG AT T C 
TAAATTATACGGTCTAACGCATAATCCTTGGGATCTTGCTCATAATGCTG 
GTGGCTCTTCTGGTGGAAGTGCAGCAGCCATTGCTAGCGGAATGACGCCA 
ATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTCCATCTTCTTG 
G AC G GG C T T G GT AG G T T T AAAAC C AAC AAGAGG AT T GGT GAGT AAT G AAA 

AGCCAGATTCGTATAGTACAGCAGTTCATTTTCCATTAAcTAAGTCATCT 
AGAGAC G C AGAAAC AT T AT T AACT TAT C T AAAG AAAAG CG AT C AAAC GC T 
AGT AT C AGT T AAT GAT T T AAAAT CT T T AC CAAT T G C T TAT AC T T T G AAAT 
C AC CAAT G G GAAC AG AAG T T AGT C AAG AT G CT AAAAACG CT AT TAT G G AC 
AAC G T C AC a T T C T T AAG AAAAC AAGG AT T C AAAG T AAC AG AGAT AGAC T T 
AC CAAT T GAT G G T AGAG CAT T AAT GCGT GAT TAT T C AAC CT T GGC T AT T G 
G CAT GGGAG G AG CT T T T T C AAC AAT T G AAAAAGAC T T AAAAAAAC AT GGT 
T T T AC T AAAG AAGACGT T GAT C CT AT T AC T T G G G C AGT T C AT GT T AT T T A 
T C AAAAT T C AG AT AAG G C T G AAC TT AAG AAAT C TAT TAT G GAAGC C C AAA 
AAC AT AT GG AT GAT TAT C GT AAG G C AAT GG AGAAG C T T C AC AAG CAAT T T 
CCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCTAAATACAGA 
T C CAT AT GT a AC AGAG G AAG AT AAAAGAG C GAT T TAT AAT AT G GAAAAC T 

TGAGCCAAGAAGAAAGAATTGCTCTCTTTAATCGCCAGTGGGAGCCTATG 
T T G C G TAG AAC AC C T T T T AC AC AAAT T G CT AAT AT GAC AGG AC T C C C AGC 
TAT C AGT AT C C C G AC T T AC T TAT C T G AGT C T G G T T T AC C CAT AGG G AC G A 
T GT T AAT GG CAGGT G C AAAC TAT GAT AT G GT AT T AAT T AAAT T T G C AAC T 
T T C T T T G AAAAAC AT C ATG GT T T T AAT GT T AAAT G G C AAAG AAT AAT AG A 
T AAAGAAGT G AAAC CAT C T AC T GGC C T AAT AC AG C C T AC T AAC T C C C T CT 
T T AAAG C T CAT T CAT CAT TAG T AAAT T T AGAAG AAAAT T C AC AAGT T AC T 

CAAGTATCTATCTCTAAAAAATGGATGAAATCGTCTGTTAAAAATAAACC 
AT C CGT AAT G G CAT AT C AAAAAG C A 

SEQ ID NO: 4804 
STRAIN COH1 

AAT AGT AC T G AGAC AAGT G C T T C AGT AG C T C C T AC T AC AAAT 
ACT AT CGT T C AAACT AAT GAC AGT AAT C C T AC C G C AAAAT T T G CAT C AG A 
AT C AG GAC AAT C T GT AAT AGGT C AAGT AAAAC C AG C T AAT TCTGCGGCGC 
T T AC AAC AGT T GAC AC G C CT C AT AT T T C AG C T C C AG AT G C T T T AAAAAC A 
ACT CAAT C AAGT C C T GT CGT T GAG AGT C CT T C TACT AAG T T AAC T GAAG A 
G AC AT AC AAAC AAAAAGAT GGT C AAG AT T T AG C C AAC AT GGT G AGAAGT G 
GTCAAGTTACTAGTGAGGAACTCGTCAATATGGCATACGATATTATCGCT 
AAAG AAAAC C CAT CT T T AAAT GC AGT C AT T AC T AC T AG AC GC C AAG AAG C 
CAT T GAAG AG G C TAG AAAAC T T AAAG AT AC T AAT C AG C C G T T T T T AGGT G 
T T C C c T T GT T AGT C AAG G G G T TAG G G C AC AGT AT T AAAG GT GGT G AAAC C 
AAT AAT G G C T T GAT C T AT G C AG AT G GAAAAAT T AG C AC AT T T GAC AGT AG 
C T AT GT C AAAAAAT AT AAAG AT T TAG GAT T TAT TAT T T TAG GAC AAAC G A 
ATTTTCCAGAGTATGGGTGGCGTAATATAACAGACTCTAAATTATACGGT 
CCAACGCATAATCCTTGGAATCTTGCTCATAACGCTGGTGGCTCTTCTGG 
TGGAAGTGCAGCAGCTATTGCTAGCGGAATGACGCCAATTGCTAGCGGCA 
GTGATGCTGGTGGTTCTATCCGTATTCCATCTTCTTGGACGGGCTTAGTA 
G GT T T AAAAC C AAC AAG AGGAT T GG T GAGT AAT G AAAAG C C AG AT T C GT A 
T AGT AC AG C AGT T C AT TT T CC AT T AAC T AAG T CAT CT AG AG AC G C AG AAA 
CAT T GT T AAC T T AC C T AAAG AAAAG C GAT C AAAC G C T AGT AT C AG T T AAT 
GAT T T AAAAT C T T T AC CAAT T G CT TAT AC T T T G AAAT C AC CAAT G G GAAC 
AGAAG T T AGT C AAGAT G C T AAAAAT G C T AT T AT G GAC AAC GT C AC AT T C T 
T AAG AAAAC AAG GAT T C AAAG T GAC AG AG AT AGAT T t AC CAAT T G AT GG T 
AGAGCATTAATGCGTGATTATTCAACCTTGGCTATTGGCATGGGAGGAGC 
T T T T T C AAC AAT T G AAAAAG AC T T AAAAAAAC AT G GT T T T AC T AAAG AAG 

ACGTTGATCCCATTACTTGGGCAGTTCATGTTATTTATCAAAATTCAGAT 
AAGG C T GAAC T T AAGAAAT C T AT T G T GG AAG C C C AAAAAC AT AT G GAT G A 

TTATCGTAAGGCAATGGAGAAGCTTCACAAGCAATTTCCTATTTTCTTAT 
C G C C AACG AC C G C AAg T T T AG C C C C T C T AAAT AC AG AT C CAT AT GT AAC A 
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GAG AAAG AT AAAAG AG C GAT T T AT AAT AT G GAAAACT T GAG C C AAGAAG A 
AAGAATTGCTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAGAACAC 
CT T T T AC AC C AAT T GC T AAT At GAC AG GAG T C C C AG CT AT C AGT AT C C CG 

ACTTACTTATCTGAGTCTGGTTTACCCATAGGGACGATGTTAATGGCAGG 
TGCAAACTATGATATGGTATTAATTAAATTTGCAACTTTCTTTGAAAAAC 
AT CAT G GT T T T AAT GT T AAAT G G C AAAG AAT AAT AGAT AAAG AAGT G AAA 

CCATCTGCTGACCTAATACAGCCTACTAACTCCCTCTTTAAAGCTCATTC 
AT CAT T AGT AAAT T T AGAAG AAAAT T C AC AAG T TAG T C AAGT AT C T AT C T 
CTAAAAAAT GG AT GAAAT C GT CT G T T AAAAAT AAAC CAT C CG T AAT GG C A 
TAT CAAAAAGC A 

SEQ ID NO: 4805 
STRAIN M732 

T C AGT AGCT C CT AC T AC AAAT ACT AT C GT T C AAAC T AAT GAC AGT AAT CC 
TACCGCAAAATTTGCATCAGAATCAGGACAATCTGTAATAGGTCAAGTAA 
AACCAGCTAATTCTGCGGCGCTTACAACAGTTGACACGCCTCATATTTCA 
GCTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGAGTCC 
T T C T AC T AAGT T AAC T G AAGAGAC AT AC AAAC AAAAAG AT GGT C AAGAT T 
TAG C C AAC AT GGT G AG AAG T GGT C AAGT T AC T AGT GAG G AAC T C GT C AAT 
AT GG CAT AC GAT AT TAT C G C T AAAGAAAAC C CAT CT T T AAAT G C AG T CAT 
T AC T AC TAG AC G C C AAG AAG C CAT T G AAG AG G C T AG AAAAC T T AAAG AT A 
CTAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCAC 
AGTATTAAAGGTGGTGAAACCAATAATGGCTTGATCTATGCAGATGGAAA 
AAT TAG C AC AT T T GAC AGT AG C TAT G T C AAAAAAT AT AAAG AT T T AGGAT 

TTATTATTTTAGGACAAACGAATTTTCCAGAGTATGGGTGGCGTAATATA 
AC AG AC T C T AAAT T AT AC GG T CnAAC GC AT AAT C CT T G GGAT C T T G C T C A 
T AAC GCTGGTGGCTCTTCTGGTG G AAGT G C AG C AGCT AT T G C TAG C GG AA 

TGACGCCAATTGCTAGCGGCAGTGATGCTGGTGGTTCTATCCGTATTCCA 
T C T T C T T GGAC G GG C T T AGT AGGT T T AAAAC C AAC AAG AGGAT T G GT GAG 
T AAT G AAAAGC C AG AT T C GT AT AGT AC AG C AGT T CAT T T T C CAT T AACT A 
AGT CAT C TAG AG AC GC AG AAAC AT T GT T AAC T T AC C T AAAG AAAAG C GAT 
C AAAC G C TAG TAT C AGTT AAT GAT T T AAAAT C T T T AC C AAT T G C T TAT AC 
T T T GAAAT C AC C AAT GGG AAC AGAAG T T AGT C AAG AT G C T AAAAAT G CT A 
T TAT GGAC AAC GT C AC AT T C T T AAG AAAAC AAGG AT T C AAAGT G AC AGAG 
AT AG AT T T AC C AAT T GAT GGT AG AGC AT T AAT G CG T GAT TAT T C AAC C T T 
GG C TAT T GG C AT GGG AG G AGC T T T TT C AAC AAT T G AAAAAG ACT T AAAAA 
AAC ATGGT T T T AC T AAAG AAG ACGT T GAT C C CAT TACT T GGG C AGTT CAT 
GTTATTTATCAAAATTCAGATAAGGCTGAACTTAAGAAATCTATTGTGGA 
AG C C C AAAAAC AT AT G GAT GAT TAT C GT AAGG C AAT G GAG AAG CT T C AC A 

AGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCTA 
AAT AC AG AT C CAT AT GTT AC AG AG AAAG AT AAAAG AG C GAT T TAT AAT AT 
GGAAAACTTGAGCCAAGAAGAAAGAATTGCTCTCTTTAATCGCCAGTGGG 
AG C CT AT GT T GC GT AG AAC AC C T T T T AC AC C AAT T G CT AAT AT G AC AGG A 
C T C C C AG C TAT C AGT AT C C CGACT T AC TT AT C T GAG T C T G GT T T AC CC AT 
AG G GAC GAT G T T AAT GG C AG GT GC AAAC TAT GAT AT GGT AT T AAT T AAAT 

T TGC AACTTT CTTTGAAAAAC AT C ATGGT TTTAATGTT AAAT GGC AAAG A 
AT AAT AG AT AAAG AAG T G AAAC CAT C T GC T GAC C T AAT AC AG C CT AC T AA 
CTCCCTCTT T AAAG CT CAT T CAT CAT T AG T AAAT T T AG AAGAAAAT T C AC 

AAGTT ACT C AAGT AT CT AT CTCTAAAAAATGGAT GAAAT CGTCT GTT AAA 
AAT AAAC CAT C C GT AAT G G CAT AT C AAAAAG C A 

SEQ ID NO: 4806 
STRAIN 18RS21 

AATAGTACTGAGACAAGTGCTTCAGTAGTTCCTACTACAAATACTATCGT 
T C AAACT AAT GAC AGT AAT CCT AC CG C AAAAT T T GT AT C AG AAT C AGG AC 
AAT C T GT AAT AG G T C AAGT AAAAC C AG AT AAT TCTGCGGCGC T T AC AAC A 
GTT GAC AC G C C T CAT CAT AT T T C AG C T C C AG AT G C T T T AAAAAC AAC T C A 
AT C AAGT CCTGTCGTT GAG AG TACT T CT ACT AAGT T AAC T G AAG AG AC T T 

ACAAACAAAAAGATGGTCAAGATTTAGCCAACATGGTGAGAAGTGGTCAA 
GT T AC TAG T GAG GAAC T C GT T AAT AT GG C AT AC GAT AT TAT TGC T AAAG A 
AAAC C C AT C T T T AAAT G C AGT CAT T AC TAG T AG AC GC C AAG AAG C TAT T G 
AAG AG G C TAG AAAAC T T AAAG AT AC C AAT C AG C C GT T T T T AGGT GT T C C C 
T T GT T AGT C AAGGGG T T AGGG C AC AGT AT T AAAGG T GGT G AAAC C AAT AA 
T GG C T T GAT CT AT G C AGAT GGAAAAAT T AG C AC AT T T GAC AG TAG C TAT G 
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T CAAAAAATAT AAAGATT TAGGAT TT ATT AT TTT AGG ACAAACGAACT T T 
CCAGAGTATGGGTGGCGTAATATAACAGATTCTAAATTATACGGTCTAAC 
GCATAATCCTTGGGATCTTGCTCATAATGCTGGTGGCTCTTCTGGTGGAA 
GTGCAGCAGCCATTGCTAGCGGAATGACGCCAATTGCTAGCGGTAGTGAT 
GCTGGTGGTTCTATCCGTATTCCATCTTCTTGGACGGGCTTGGTAGGTTT 
AAAAC C AAC AAG AGG AT T G GT GAGT AAT GAAAAG C C AG AT T CGT AT AG T A 
C AG C AGT T CAT T T T C CAT T AAC T AAG T CAT C TAG AG AC G C AG AAA CAT T A 
T T AAC T TAT C T AAAGAAAAG C GAT C AAACG CT AGT AT C AGT T AAT GAT T T 

AAAATCTTTACCAATTGCTTATACTTTGAAATCACCAATGGGAACAGAAG 
T TAG T C AAG AT G CT AAAAACG C TAT TAT GG AC AACGT C AC AT T CT T AAGA 
AAAC AAGGAT T C AAAG T AAC AG AGAT AG AC T TAG C AAT T GAT G GT AG AG C 

ATTAATGCGTGATTATTCAACCTTGGCTATTGGCATGGGAGGAGCTTTTT 
C AAC AAT T G AAAAAG AC T T AAAAAAA C AT GG T T T T AC T AAAG AAG AC GT T 

GATCCTATTACTTGGGCAGTTCATGTTATTTATCAAAATTCAGATAAGGC 
TGAACTTAAGAAATCTATTATGGAAGCCCAAAAACATATGGATGATTATC 
G T AAG G C AAT G GAG AAG CT T C AC AAGC AAT T T C CT AT T T T CT TAT C G C C A 
AC GAC CG C AAGT T TAG C C C CT CT AAAT AC AGAT C CAT AT GT AAC AG AG G A 
AG a t AAAAG AG CG AT T TAT AAT AT G GAAAAC T T GAG C C AAG AAGAAAG AA 

TTGCTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAGAACACCTTTT 
AC AC AAAT T G C T AAT AT GAC AGG AC T C C C AG C TAT C AGT AT C C CGAC T T A 

CTTATCTGAGTCTGGTTTACCCATAGGGACGATGTTAATGGCAGGTGCAA 
AC TAT GAT AT G GT AT T AAT T AAAT T T G C AAC T T T C T T T GAAAAAC AT CAT 
G G T T T T AAT GT T AAAT G G C AAAG AAT AAT AG AT AAAG AAG T GAAAC CAT C 
T AC T G G C CT AAT AC AG CC T AC T AAC TCCCTCTT T AAAGC T CAT T CAT CAT 
T AGT AAAT T T AG AAG AAAATT C AC AAGT T AC T C AAGT AT C T AT C T C T AAA 
AAAT G GAT G AAAT C G T C T GT TAAAAAT AAAC CAT C C GT AAT GG C AT AT C A 
AAAAG C A 

SEQ ID NO: 4807 
STRAIN M781 

T G C T T C AGT AGC T C C T AC T AC AAAT AC TAT C GT T C AAAC T AAT GAC AGT A 
AT C CT AC C G C AAAAT T T GC AT C AG AAT C AGG AC AAT CT G T AAT AGG T C AA 
GT AAAAC C AG C T AAT T C T G CGGCG CT T AC AAC AG T T GAC AC G C C T CAT AT 
TTCAGCTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGA 
GT C C TT C TACT AAG T T AAC T G AAG AGAC AT AC AAAC AAAAAG AT GGT C AA 
GAT T TAG C C AAC AT G G T GAG AAGT G G T C AAG T T AC TAG T GAG GAAC T CGT 

CAATATGGCATACGATATTATCGCTAAAGAAAACCCATCTTTAAATGCAG 
T CAT T AC T AC T AGAC G C C AAG AAG C CAT T G AAG AGG C TAG AAAAC T T AAA 

GATACTAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGG 
G C AC AGT AT t AAAG GT GGT GAAAC C AAT AAT GG C T T GAT CT AT G C AG AT G 
G AAAAAT TAG C AC AT T T GAC AGT AG C T AT G T CAAAAAATAT AAAG AT T T A 
G GAT T T AT TAT T T T AGG AC AAAC G a AT T T T C C AG AG TAT GGG T G G C G T AA 
TATAACAGACTCTAAATTATACGGTCCAACGCATAATCCTTGGAaTCTTG 
CTCATAACGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGCTATTGCTAGC 
GGAATGACGCCAATTGCTAGCGGCAGTGATGCTGGTGGTTCTATCCGTAT 
T C CAT C T T C T T GG AC G GG C T T AG T AGGT T T AAAAC C AAC AAG AG GAT T G G 
T GAGT AAT GAAAAG C C AG AT T C GT AT AG T AC AG C AGT T CAT T T T C C AT T A 
ACT AAGT CAT C TAG AG AC G C AG AAAC AT T G T T AAC T T AC C T AAAG AAAAG 

CGATCAAACGCTAGTATCAGTTAATGATTTAAAaTCTTTACCAATTGCTT 
AT ACT T T G AAAT C AC C AAT G GG AAC AGAAg T T AGT C AAG AT G C TAAAAAT 
G CT AT T AT GG AC AAC GT C AC AT T C T T AAG AG AAC AAGGAT T C AAAGT G AC 
AGAG AT AG AT T T AC C AAT T GAT G GT AG AG CAT T AAT G C GT G AT TAT T C AA 

CCTTGGCTATTGGCATGGGAGGAGCTTTTTCAACAATTGAAAAAGACTTA 
AAAAAAC AT G GT T T T AC T AAAG AAG AC G T T GAT C C CAT TAG T T GG G C AGT 

TCATGTTATTTATCAAAATTCAGATAAGGCTGAACTTAAGAAATCTATTG 
T G GAAG C C C AAAAAC AT AT G GAT GAT TAT C GT AAGG C AAT GG AG AAG C T T 

CACAAGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCC 
T C T AAAT AC AG AT C CAT AT GT AAC AG a G a AAG AT AAAAG AG C G AT T TATA 
AT AT G GAAAAC T T GAG C C AAG AAG AAAG AAT TGCTCTCTT T AAT C G C C AG 

TGGGAGCCTATGTTGCGTAGAACACCTTTTACACCAATTGCTAATAtGAC 
AG G ACT C C C AG C T AT C AGT AT C C C GAC T TACT TAT C T G AGT C T GGT T T AC 

CCATAGGGACGATGTTAATGGCAGGTGCAAACTATGATATGGTATTAATT 
AAATTTGCAACTTTCTTTGAAAAACATCATGGTTTTAATGTTAAATGGCA 
AAG AAT AAT AG AT AAAG AAGT GAAAC CAT CT G C T GAC C T AAT AC AG C C T A 
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C T AACT C C C T C T T T AAAG C T CAT T CAT CAT T AGT AAAT T TAGAAG AAAAT 
T C AC AAGT T AC T C AAGT AT C T AT C T C T AAAAAAT G GAT G AAAT CGT C T GT 
T AAAAAT AAAC CAT C C GTAAT GG CAT AT C AAAAAG C A 

SEQ ID NO: 4810 
STRAIN CJB110 

T AGT T C CT ACT AC AAAT ACT AT C GT T C AAACT AAT G AC AG T AAT C CT AC C 

GCAAAATTTGTATCAGAATCAGGACAATCTGTAATAGGTCAAGTAAAACC 
AGAT AAT T C T G C G G C GC T T AC AAC AGT T G AC AC G C CT CAT CAT AT T T C AG 

CTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGAGTACT 
T C T AC T AAGT T AACT G AAG AG AC T T AC AAAC AAAAAG AT GGTAAAG AT T T 
AG C C AAC AT GGT G AGAAGT GGT C AAGT T AC T AGT GAG G AAC T CGT T AAT A 
T GG C AT AC GAT AT TAT T G C T AAAGAAAAC C CAT CT T T AAAT G C AGT CAT T 
ACTACTAGACGCCAAGAAGCTATTGAAGAGGCTAGAAAACTTAAAGATAC 
CAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCACA 
G T AT T AAAGGT GGT GAAAC C AAT AAT GGC T T GAT C TAT G C AG AT GG AAAA 
AT TAG C AC AT T T G AC AGT AG C T AT GT C AAAAAAT AT AAAG AT T TAG GAT T 
TATTATTTTAGGACAAACGAACTTTCCAGAGTATGGGTGGCGTAATATAA 
CAGATTCTAAATTATACGGTCTAACGCATAATCCTTGGGATCTTGCTCAT 
AATGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGCCATTGCTAGCGGAAT 
GACGCCAATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTCCAT 
CTTCTTGGACGGGCTTGGTAGGTTTAAAACCAACAAGAGGATTGGTGAGT 
CAT G AAAAG C C AG AT T C GT AT AGT AC AG C AGT T CAT T T T C CAT T AAC T AA 
GT CAT C TAG AG AC G CAG AAAC AT TAT T AAC T TAT C T AAAGAAAAG CG AT C 
AAACG C T AGT AT C AG T T AAT GAT T T AAAAT C TT T AC C AAT T G CT T AT AC T 
T T G AAAT C AC C AAT GG G AAC AGAAG T T AG T C AAG AT G C T AAAAAC G C T AT 
T AT GGACAACGT CACATT CT T AAGAAAACAAGGAT T C AAAG T AAC AG AG A 
T AGAC T T AC C AAT T G AT GG T AGAG CAT T AAT G C G T GAT TAT T C AAC C T T G 

GCTATTGGCATGGGAgGAGCTTTTTCAACaATTGAAAAAGAcTTAaAAAA 

AcATGGTTTTACTAAAGAAGACGTTGATCCTATTACTTGGGCAGTTCATG 
T T AT T T AT C AAAAT T C AGAT AAG GC T G AAC T T AAGAAAT C T AT TAT GG AA 
G C C C AAAAAC AT AT G GAT GAT TAT CG T AAGG C AAT G GAG AAG C T T C AC AA 

GCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCTAA 
AT AC AGAT C C AT AT GT AAC AG AGG AAGAT AAAAGAG C GAT T TAT AAT AT G 

GAAAACTTGAGCCAAGAAG AAAG AATTGCTCTCTTT AAT CGCCAGTGGGA 
G C CT AT GT T GC GT AGAAC AC CT T T T AC AC AAAT T G C T AAT At G AC AG GAC 

TCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCCATA 
g GG AC g AT GT T AAT G G C AGG T G C AAACT AT GAT AT G GT AT T AAT T AAAT T 
T G C AAC T T T C T T T GAAAAAC AT CAT GGT T T T AAT GT T AAAT G G C AAAG AA 
T AAT AG AT AAAG AAG T GAAAC CAT C T AC T GG C CT AAT AC AG C C TAG T AAC 
TCCCTCTT T AAAGC T CAT T CAT CAT T AGT AAAT T TAG AAG AAAAT T C AC A 

AGTTACTCAAGTATCTATCTCTAAAAAATGGATGAAATCGTCTGTTAAAA 
AT AAAC CAT C C GTAAT GG CAT AT C AAAAAG C A 

SEQ ID NO: 4811 
STRAIN 1169NT 

AATAGTACTGAGACAAGTGCTTCAGTAGCTCCTACTACAAATACTATCGT 
T C AAAC T AAT GAC AGT AAT C C T AC C GC AAAAT T T G CAT C AGAAT C AGGAC 
AAT C T GTAAT AT GT C AAGT AAAAC CAG AT AAT TCTGCGGCGCT T AC AAC A 
G T T GAC AC G C C T CAT AT T T CAG C T C CAG AT GAT T T AAAAAC AACT C AAT C 

AAGTCCTGTCGTTGAGAGTACTTCTACTAAGTTAACTGAAGAGACATACA 
AAC AAAAAG AT GGT C AAG AT T T AG C C AAC AT GGT GAG AAGT GGT C AAGT T 

ACTAGTGAGGAACTCGTCAATATGGCATACGATATTATTGCTAAAGAAAA 
CCCTTCTTTAAATGCAGTCATTACTACTAGACGCCAAGAAGCCATTGAAG 
AGGCTAGAAAACTTAAAGATACTAATCAGCCATTTTTAGGTGTTCCCTTG 
T T AGT C AAGGG GT T AG G G C AC AGT AT T AAAG G T GGT GAAAC C AAT AAT G G 
C T T GAT C TAT G CAG AT GG AAAAAT t a G C AC AT T T GAC AGT AG C T AT G T C A 
AAAAAT AT AAAG AT T T AG GAT T TAT TAT T T TAG GAC AAAC GAAC T T T C C A 
GAG TAT GGG'FGG C GTAAT AT AAC AG AT T C T AAAT TAT AC G GT C C AACG C A 

TAACCCTCGGAATCTTGCTCATAATGCTGGTGGCTCTTCTGGTGGAAGTG 
CAG CAG C CAT T GC T AG C G G r AT G AC G C C AAT T GC T AGC GG T AG T GAT G CT 

GGTGGTTCTATCCGtATTCCATCTTCTTGGACGGGCTTGGTAGGTTTAAA 
AC C AAC AAG AGG AT T G GT G AGT AAT G AAAAG C CAG AT T CGT AT AGT AC AG 
C AGT T CAT T T T C C ATT AACT AAGT C AT C TAG AGAC G CAG AAAC AT TAT TA 
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ACT T AT CT AAAG AAAAG C GAT C AAACG C T AGT AT C AGT T AAT GAT T T AAA 
ATCTTTACCAATTGCTTATACTTTGAAATCACCAATGGGAACAGAAGTTA 
GTCAAGATGCTAAAAACGCTATTATGGACAACGTCACATTCTTAAGAAAA 
C AAGGAT T C AAAGT AAC AG AGAT AG ACT TAG C AAT T GAT G GT AG AG CAT T 
AATGCGTGATTATTCAACCTTGGCTATTGGCATGGGAGGAGCTTTTTCAA 
CAATTGAAAAAGACTTAAAAAAACATGGTTTTACTAAAGAAGACGTTGAT 
C CT AT T AC T T GGG C AGT T CAT G T T ATT TAT C AAAAT T C AGAT AAG GC T G A 
AC T T AAG AAAT CT AT T AT G G AAG C C C AAAAAC AT AT G G ATG ATT AT C GT A 
AGG C AAT G GAG AAG CT T C AC AAG C AAT T T C CT AT T T T CT TAT CGC C AACG 
AC CG C AAG T T T AG C C C C T C T AAAT AC AG At C C AT AT GT AAC AGAGG AAG A 
T AAAAGAGCG AT T T AT AAT AT GG AAAACT T G AGC CAAGAAG AAAG AAT T G 
CTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAGAACACCTTTTACA 
CAAATTGCTAATATGACAGGACTCCCAGCTATCAGTATCCCGACTTACTT 
AT CT G AGT C T G GT T T AC C CAT AGG G AC GAT GT T AAT G G C AGGT G C AAACT 
AT GAT AT GG T ATT AAT T AAAT T T G C AAC T T T C T T T G AAAAAC AT CAT G GT 
T T T AAT GT T AAAT GG C AAAG AAT AAT AG AT AAAG AAG T G AAAC C AT C T AC 
T GG C C T AAT AC AGC CT ACT AAC T C C C T C T T T AAAGCT CAT T CAT CAT TAG 
T AAAT T TAG AAG AAAAT T CACAAGTT ACT CAAGTAT CT AT CT CT AAAAAA 
T GG AT GAAAT C G T C T GT TAAAAAT AAAC CAT C C GT AAT GG CAT AT C AAAA 
AGC A 

SEQ ID NO: 4812 
STRAIN JM9130013 

TTCAGTAGCTCCTACTACAAATACTATCGTTCAAACTAATGACAGTAATC 
C T AC C G C AAAAT T T T CAT C AGAAT C AG GAC AAT CT GT AAT AG GT C AAGT A 
AAAC C AG C T AAT TCTGTGGCGCT T AC AAC AGT T G AC ACG C C T CAT AT T T C 
AG CT C C AG AT G C T TT AAAAAC AAC T C AAT C AAGT C C T GT CGT T GAG AGT C 
C T T CT ACT AAGT T AAC T GAAGAG AC AT AC AAAC AAAAAG AT GGT C AAG AG 
TTAGCCAACATGGT GAGAAGT GGT CAAGT TACT AGT GAGGAACT CGT CAA 
TATGGCATACGATATTATTGCTAAAGAAAACCCATCTTTAAATGCAGTCA 
T TACT ACT AG AC GCC AAG AAG C T AT T GAAGAGG CT AG AAAACT T AAAG AT 
ACCAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCA 
C AGT AT T AAAG GT G G T G AAAC C AAT AAT GG C T T GAT C T AT G C AG GT GG AA 
AAAT T AGC AC AT T T GAC AGT AG C T ATGT C AAAAAAT AT AAAG AT T T AG G A 
T T TAT TAT T T T AGG AC AAACG AAC T T T C C AGAGT AT GG AT G G C G C AAT AT 
AACAGATTCTAAATTATACGGTCCAACGCATAACCCTTGGAATCTTGCTC 
ATAATGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGTTATTGCTAGCGGG 
ATGACGCCAATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTCC 
AT CTT CTTGGACGGGCTTGGT AGGTT T AAAACCAACAAGAGGAT TGGTGA 
GTAATGAAAAGCCAGATTCGTATAGTACAGCAGTTCATTTTCCATTAACT 
AAGT CAT CT AG AG AC G C AG AAAC AT TAT T AAC T TAT C T AAAGAAAAG CG A 
T C AAACG CT AGT AT C AGT T AAT GAT T T AAAAT CT T T AC C AAT T G C T TATA 
C T T T GAAAT C AC C AAT GG G AAC AGAAGT TAG T C AAGAT G CT AAAAAT GC T 
AT T AT GG AC AAC GT CAT AT T C T T AAG AAAAC AAGGAT T C AAAGT GAC AG A 
GATAGACTTACCAATTGATGGTAGAGCATTAATGCGTGATTATTCAACCT 
TGGCTATTGGTATGGGAGGAGCTTTTTCAACAATTGAAAAAGACTTAAAA 
AAAC AT GGT T T T ACT AAAGAAG AC GT T GAT C C C AT T AC T T G GGGAGT T C A 
T GT TAT T TAT C AAAAT T C AG AT AAGG C T G AACT T AAG AAAT CT AT TAT G G 
AAGCCCAAAAACATATGGATGATTATCGTAAGGCAATGGAGAAGCTTCAC 
AAGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCT 
AAATACAGATCCATATGTAACAGAGGAAGATAAAAGAGCGATTTATAATA 
TGGAAAACTTGAGCCAAGAAG AAAG AAT TGCTCTCTTT AAT CGCCAGTGG 
GAG C C T AT GT T G CGT AGAAC AC CTT T T AC AC AAAT T G C T AAT AT GAC AGG 
ACTCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCCA 
TAGGGACGAT GTTAATGGC AGGT GCAAACTATGATATGGT ATT AATT AAA 
TTTGCAACTTTCTTTGAAAAATATCATGGTTTTAATGTTAAATGGCAAAG 
AATAATAGATAAAGAAGTGAAACCATCTACTGGCCTAATACAGCCTACTA 
ACT C C C T CT T T AAAG C T CAT T CAT CAT T AGT AAAT T TAG AAG AAAAT T C A 
CAAGTT ACT CAAGT AT CT AT CT CTAAAAAAT GGAT GAAATCGT CTGTT AA 
AAATAAACCATCCGTAATGGCATAT 

SEQ ID NO: 4813 
STRAIN H36B 

CTTCAGTAGTTCCTACTACAAATACTATCGTTCAAACTAATGACAGTAAT 
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CCTACCGCAAAATTTTCATCAGAATCAGGACAATCTGTAATAGGTCAAGT 
AAAAC C AG CT AAT TCTGTGGC G CT T AC AAC AGT T GAG AC G C C T CAT ATT T 

CAGCTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGAGT 
CCTTCTACTAAGTTAACTGAAGAGACATACAAACAAAAAGATGGTCAAGA 
TTTAGCCAACATGGTGAGAAGTGGTCAAGTTACTAGTGAGGAACTCGTCA 
AT AT G GC AT a C GAT At TAT T GCT AAAGAAAACC C AT CT T T AAAT GC AGT C 
AT T AC T ACT AG ACG C C AAG AAG C TAT TG AAGAGG CT AG AAAAC T T AAAGA 

TACCAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGC 
AC AGT AT T AAAGGT GGT G AAAC C AAT AAT G G CT T G AT CT AT GC AGGT G GA 
AAAAT T AGC AC AT T T G AC AG TAG C T AT G T C AAAAAAT AT AAAG AT T T AGG 
AT T TAT TAT T T T AGG AC AAAC G AAC T T T C C AG AGT AT GGAT G G CG C AAT A 
TAACAGATTCTAAATTATACGGTCCAACGCATAACCCTTGGAATCTTGCT 
CATAATGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGTTATTGCTAGCGG 
GATGACGCCAATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTC 
CATCTTCTTGGACGGGCTTGGTAGGTTTAAAACCAACAAGAGGATTGGTG 
AGTAATGAAAAGCCAGATTCGTATAGTACAGCAGTTCATTTTCCATTAAC 
T AAG T CAT C TAG AG ACG C AG AAAC AT T AT T AAC T TAT CT AAAGAAAAG CG 
AT C AAAC G CT AGT AT C AGT T AAT GAT T T AAAAT CT T TAG C AAT T G C T TAT 

ACTTTGAAATCACCAATGGGAACAGAAGTTAGTCAAGATGCTAAAAATGC 
T ATT AT GGACAACGT CATATT CT T AAGAAAACAAGGATT CAAAGTGACAG 
AGAT AGACT T AC C AAT T GAT GG TAG AGC AT T AAT G C GT GAT TAT T C AAC C 
T T GGC TAT T GGT AT GGG AGGAG C T T T T T CAAC AAT T GAAAAAGACT T AAA 
AAAAC AT GGT T T T AC T AAAG AAGAC GT T GAT C C CAT T AC T T GGGC AG T T C 
AT G T TAT T T AT CAAAAT T C AGAT AAGG C T GAAC T T AAGAAAT C T AT TAT G 
G AAG C C C AAAAAC AT AT G GAT GAT TAT C G T AAGG C AAT GGAG AAGCT T C A 
CAAGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTC 
T AAATACAGAT CCAT ATGTAACAGAGGAAGATAAAAGAGCGAT T TAT AAT 
AT GG AAAAC T T G AG CC AAG AAG AAAG AAT TGCTCTCTT T AAT CGC C AGT G 

GGAGCCTATGTTGCGTAGAACACCTTTTACACAAATTGCTAATATGACAG 
GACTCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCC 
AT AG G G AC G AT GT T AAT G G C AG GT G C AAAC TAT GAT AT GG T AT T AAT T AA 

ATTTGCAACTTTCTTTGAAAAATATCATGGTTTTAATGTTAAATGGCAAA 
GAAT AAT AG AT AAAG AAGT G AAACC AT C T AC T G G C C T AAT AC AG C C T ACT 
AAC TCCCTCTT T AAAG C T CAT T CAT CAT T AGT AAAT T T AG AAGAAAAT T C 
AC AAGTT AC T C AAGT AT C TAT C T C T AAAAAAT G G AT G AAAT CGTCTGTTA 
AAAATAAA 

SEQ ID NO: 4814 

STRAIN 2603 frame: 1 

NSTETSASWPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAP 
DALKTTQSSPWESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPS 
LNAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFD 
SSYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIAS 
GMTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETL 
LTYLKKSDQTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEID 
LPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELK 
KS IMEAQKHMDD YRKAMEKLHKQFP I FLSPTTAS LAPLNTDP YVTEE DKRAI YNMENLSQ 

EERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLI 
. KFATFFEKHHGFNVKWQRI I DKEVKPSTGLI QPTNSLFKAHS SLVNLEENSQVTQVS I SK 
KWMKSSVKNKPSVMAYQKA 

SEQ ID NO: 4815 

STRAIN _090 frame: 1 

NSTETSASVVPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAP 
DALKTTQS S P WE S T S TKLTEET YKQKDGKDL ANMVRS GQVT S EELVNMAYD 1 1 AKEN PS 

LNAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFD 

SSYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIAS 

GMTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETL 

LTYLKKSDQTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEID 

LPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELK 

KSIMEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEE DKRAI YNMENLSQ 

EERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLI 

KFATFFEKHHGFNVKWQRI I DKEVKPSTGLIQPTNSLFKAHS SLVNLEENSQVTQVS ISK 
KWMKSSVKNKPSVMAYQKA 
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SEQ ID NO: 4816 

STRAIN A909 frame: 2 

TTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAPDALKTTQSSPV 
VESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTRRQE 
AIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYKDLG 
FIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIASGMTPIASGSDA 
GGS IRI PSSWTGLVGLKPTRGLVSNEKPDS YSTAVHFPLTKS SRDAETLLTYLKKS DQTL 
VS VNDLKSLPIAYTLKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I DLP.I DGRALMRD 
Y S T LAI GMGG AF S T I E KD LKKHG FT KE DVD PIT W AVH V I YQN S DKAE LKKS I ME AQKHM D 
DYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFNRQW 
EPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEKHHG 
FNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVKNKP 
SVMAYQKA 

SEQ ID NO: 4817 

STRAIN COH1 frame: 1 

NSTETSASVAPTTNTIVQTNDSNPTAKFASESGQSVIGQVKPANSAALTTVDTPHISAPD 
ALKTTQSSPWESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSL 
NAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDS 
SYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAAIASG 
MTPh\SGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLL 
TYLKKSDQTLVS VNDLKSLPIAYTLKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I DL 
PIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKK 
SIVEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEKDKRAIYNMENLSQE 
ERIALFNRQWEPMLRRTPFTPIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIK 
FATFFEKHHGFNVKWQRIIDKEVKPSADLIQPTNSLFKAHSSLVNLEENSQVTQVSISKK 
WMKS SVKNKP SVMAYQKA 

SEQ ID NO: 4818 

STRAIN M732 frame: 1 

SVAPTTNTIVQTNDSNPTAKFASESGQSVIGQVKPANSAALTTVDTPHISAPDALKTTQS 
SPWESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTR 
RQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYK 
DLGFIILGQTNFPEYGWRNITDSKLYGXTHNPWDLAHNAGGSSGGSAAAIASGMTPIASG 
SDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 
QTLVSVNDLKSLPIAYTLKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I DLPIDGRAL 
MRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIVEAQK 
HMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEKDKRAIYNMENLSQEERIALFN 
RQWEPMLRRTPFTPIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEK 
HHGFNVKWQRI I DKEVKPSADLIQPTNS LFKAHS SLVNLEENSQVTQVS I SKKWMKS SVK 
NKP SVMAYQKA 

SEQ ID NO: 4819 

STRAIN 18RS21 frame: 1 

NSTETSASWPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAP 
DALKTTQSSPWESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPS 
LNAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFD 
SSYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIAS 
GMTPI ASGS DAGGS IRI PS SWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKS SRDAETL 
LT YLKKS DQTLVS VNDLKS LP I AYT LKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I D 
LPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELK 
KS IME AQKHMD D YRKAMEKLHKQ FP I FL S PTT AS LAP LNT D P Y VTEE DKRAI YNMENLS Q 
EERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLI 
KFATFFEKHHGFNVKWQRI I DKEVKPSTGLIQPTNSLFKAHS SLVNLEENSQVTQVS ISK 
KWMKS SVKNKP SVMAYQKA 

SEQ ID NO: 4820 

STRAIN M7 81 frame: 2 

ASVAPTTNTIVQTNDSNPTAKFASESGQSVIGQVKPANSAALTTVDTPHISAPDALKTTQ 
SSPWESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITT 
RRQEAIEE ARKLKDTNQP FLGVPLLVKGLGHS IKGGETNNGLI YADGKI ST FDS S YVKKY 
KDLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAAIASGMTPIAS 
GSDAGGS IRI PSSWTGLVGLKPTRGLVSNEKPDS YSTAVHFPLTKS SRDAETLLTYLKKS 
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DQTLVSWDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLREQGFKVTEIDLPIDGRA 

LMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIVEAQ 

KHMDDYRBCAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEKDKRAIYNMENLSQEERIALF 

NRQWEPMLRRTPFTPIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFE 

KHHGFNVKWQRIIDKEVKPSADLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSV 
KNKP S VMAYQKA 

SEQ ID NO: 4821 

STRAIN CJB110 frame: 3 

VPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAPDALKTTQSS 
P WE STSTKLTEETYKQKDGKD L ANM VR S GQ VT SEE L VNMA Y D 1 1 AKEN P S LN AV I T T RR 

QEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYKD 

LGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIASGMTPIASGS 

DAGGSIRIPSSWTGLVGLKPTRGLVSHEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSDQ 

TLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEIDLPIDGRALM 

RDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIMEAQKH 

MDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFNR 

QWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEKH 

HGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVKN 
KPS VMAYQKA 

SEQ ID NO: 4822 

STRAIN 1169NT frame: 1 

NSTETSASVAPTTNTIVQTNDSNPTAKFASESGQSVICQVKPDNSAALTTVDTPHISAPD 
DLKTTQSSPWESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSL 
NAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDS 
SYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGPTHNPRNLAHNAGGSSGGSAAAIASG 
MTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLL 
T YLKKS DQTL V S VN DLKS L P I AYT LKS PMGTE VS Q DAKNAIMDNVT FLRKQG FKVTE I DL 

PIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKK 
SIMEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQE 
ERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIK 
FATFFEKHHGFNVKWQRI I DKEVKPSTGLIQPTNSLFKAHS SLVNLEENSQVTQVS I SKK 
WMKSSVKNKPS VMAYQKA 

SEQ ID NO: 4823 

STRAIN JM9130013 frame: 2 

SVAPTTNTIVQTNDSNPTAKFSSESGQSVIGQVKPANSVALTTVDTPHISAPDALKTTQS 

SPWESPSTKLTEETYKQKDGQELANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTR 

RQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYAGGKISTFDSSYVKKYK 

DLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAVIASGMTPIASG 

SDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 

QTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVIFLRKQGFKVTEIDLPIDGRAL 

MRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWGVHVIYQNSDKAELKKSIMEAQK 

HMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFN 

RQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEK 

YHGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVK 
NKPSVMAY 

SEQ ID NO: 4824 

STRAIN H36B frame: 3 

SVVPTTNTIVQTNDSNPTAKFSSESGQSVIGQVKPANSVALTTVDTPHISAPDALKTTQS 
SPVVESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTR 
RQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYAGGKISTFDSSYVKKYK 
DLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAVIASGMTPIASG 
SDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 
QTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVIFLRKQGFKVTEIDLPIDGRAL 
MRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIMEAQK 
HMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFN 
RQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEK 
YHGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVK 
NK 

SEQ ID NO: 4901 
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STRAIN 2603 

aaacatccgatacttaatgatcaaaaatccttagcaattgttgaacagat 
agaatatgattttgataaattcgataattcagaagcttctttttatgcaa 
cattagctagawttcgcgttatggatagagaaatcaaaaaatttattaga 
gaaaatccaaatagtcaaatcctttcaattggttgtggacttgatacaag 
gtttgaaagagtcgataatggacaaattaggtggtataaccttgatttgc 
cagaggttatggagataagaaaattattttttgaagagcatgaaagagtt 
actaatatagcaaaatcagccctagatgaaacttggacacgggaggtaaa 
tccccaaaatgccccttttctaatcgtgtcagaaggtgttttaatgtttc 
taaaagaagatgacgtagagacttttcttcatatcctgacaaattcattt 
agccaatttatggcacaatttgatttgtgtcataaggaaatgattaataa 
aggaaagcaacatgatacagtaaagtatatggatacagaatttcagtttg 
gtatcacagatggtcatgagattgtggatttagaccctaaattaaagcaa 
ataaatctgattaactttacagatgagatgagcaaatttgagttaggcac 

acttcgctctttacttccaacaattcgtaaatttaataattgtttaggtg 
tgtacgaatataaagcatc 

SEQ ID NO: 4902 
STRAIN 090 

T AAT GAT C AAAAAT C CT T AG C AAT T G T T G AAC AG AT AGAAT AT GAT T T T G 
AT AAAT T C G AT AAT T C AGAAG C T T CT T T T T AT G C AAC AT TAG C T AGAAT T 
C G C GT T AT GG AT AG AG AAAT C AAAAAAT T TAT T AG AG AAAAT C C AAAT AG 
TCAAATCCTTTCAATTGGTTGTGGACTTGATACAAGGTTTGAAAGAGTCG 
AT AAT G G AC AAAT T AGGT G G TAT AAC CT T GAT T T G C C AG Ag GT TAT GGAG 
AT AAGAAAAT TAT T T T T T G AAG AG C AT G AAAGAG T TAG T AAT AT AG C AAA 
AT C AG C CAT AG AT G AAAC T T GGAC AC G GGAG GT AAAT C C C C AAAAT G C C C 
C T T T T C T AAT C GTGT C AG AAG G T GT T T T AAT GT T T C T AAAAG AAG AT G AC 
GTAGAGACTTTTCTTCATATCCTGACAAATTCATTTAGCCAATTTATGGC 
AC AAT T T GAT T T GT GT C AT AAGG AAAT GAT T AAT AAAG G AAAGC AAC AT G 
AT AC AGT AAAG T AT AT GG AT AC AG AAT T T C AG T T T G GT AT C AC AG AT GGT 
CATGAGATTGTGGATTTAGACCCTAAATTAAAGC AAAT AAAT CT GAT TAA 
CT T T AC AGAT GAG AT GAG C AAAT T T GAG T T AGG C AC AC TTCGCTCT T T AC 
T T C C AAC AAT T C GT AAAT T T AAT AAT T GT T T AGGT G T GT AC G AAT AT AAA 
GCATC 

SEQ ID NO: 4903 
STRAIN A909 

AAAC AT C CGAT AC T T AAT G A 

T C AAAAAT C C T TAG C AAT T GT T G AAC AG AT AGAAT AT GAT T T T GAT AAAT 
T C GAT AAT T C AG AAGC T T C T T T T TAT GC AAC AT T AGC T AGAAT T C GC GT T 
AT GGAT AG AGAAAT C AAAAAAT T TAT T AG AGAAAAT C C AAAT AGT C AAAT 
CcTTTCa ATT GGTTGTGGACTTGATACAAGGTTTGAAAGAGT CGAT AATG 
G AC AAAT T AGG T G GT AT AAC C T T GAT T T G C C AG AGGT TAT G GAG AT AAG A 
AAAT TaTTTTTT G AAGAG C AT G AAAG AG T T AC T AAT AT AGC AAAAT C AG C 
CCTAGATGaAACTTGGACACGGGAGGTAAATCCCCAAAATGCCCCTTTTC 
T AAT C GT GT C AG AAG G T G T T T T AAT G T T t C T AAAAG AAG AT GAC GT AG AG 
AC T T T T c T T CAT AT C C T G AC AAAT T CAT T TAG C C AAT T T AT GG C AC AAT T 
T GAT T T GT GT CAT AAG G AAAT GAT T AAT AAAGG AAAG C AAC AT GAT AC AG 
T AAAG TAT AT G GAT AC AGAAT T T C AG T T T GGT AT C AC AG AT GGT CAT GAG 
ATTGTGGATTTAGACCCTAAATTAAAGCAAATAAATCTGATTAACTTTAC 
AGATGAGATGAGCAAATTTGAGTTAGGCACACTTCGCTCTTTACTTCCAA 
CAATT CGT AAAT T T AAT AATT GT T T AGGT GT GTACGAAT AT AAAGCAT C 

SEQ ID NO: 4904 
STRAIN H36B 

AAACAT CCGATACT T AAT GAT CAAAAAT CCT T AGC A 

AT T G T T G AAC AG AT AGAAT AT GAT T T T GAT AAAT T C GAT AAT T C AG AAG C 
TTCTTTTTATGCAaCATTAGCTAGAATTCGCGTTATGGATAGAGAAATCA 
AAAAATT T ATT AGAGAAAAT C C AAAT AGT CAT AT CCT TT CAAT T GGCTGT 
G g ACT T G AT AC AAG GT T T G AAAG AG T C GAT AAT G GAC AAAT TAG GT GGT A 
TAACCTTGATTTGCCAGAGGTTATGGAGATAAGAAAATTATTTTTTGAAG 
AGCATGAAAGAGTTACTAATATAGCAAAATCAGCCcTAGATGAAACTTGG 
ACACGGGAGGTAAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGG 
T GT T T T AAT G T T T C T AAAAG AAG AT G ACGT AG AG AC T T T T C T T CAT AT C C 
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T GAC AAAT T C AT TT AG CG AAT T TAT GG C AC AAT T T GAT T T GT GT C AgAAG 
GAAAT GAT T AAT AAAGG AAAG C AAC AT GAT AC AGT AAAGT AT AT G GAT AC 
AGAATTTCAGTTGGGTATCACAGATGGTCATGAAATTGTGGATTTAGACC 
C T AAAT T AAAG C AAAT AAAT C T GAT T AAC T T T AC AG AT GAG AT GAG C AAA 

TTTGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAA 
T AAT T GT T T AG GT GT G T AC GAAT AT AAAGC AT C 

SEQ ID NO: 4905 
STRAIN 18RS21 

AACAT C C GAT AC T T AAT GAT C AAAAAT C C T T AG C AAT 
T GT T G AAC AG AT AG AAT AT GAT T T T GAT AAAT T C GAT AAT T C AG AAG C T T 
C T T T T T AT G C AAC AT TAG C T AGAAT T C G C GT T AT GGAT AGAGAAAT C AAA 
AAATTTATTAGAGAAAATCCAAATAGTCaAATCCTTTCAATTGGTTGTGG 
ACTTGATACAAGGTTTGAAAGAGTCGATAATGGACAAATTAGGTGGTATA 
AC CT T GAT T T G C C AG AGGT T AT G G AGAT AAG AAAAT T ATT T T T T GAAG AG 
CAT G AAAG AGT TAG T AAT AT AGC AAAAT C AGC C C TAG AT G AAAC T T G GAC 
AC GGG AG GT AAAT C C C C AAAAT GCCCCTTTT CT AAT C GT GT C Ag AAGGT G 
T T T T AAT GT T T CT AAAAGAAG AT G AC GT AGAG AC T T T T CT T CAT AT C C T G 
AC AAAT T CAT T T AG C C AAT T T AT GG C AC a AT T T GAT T T GT G T C AT Aa GG A 
AAT GAT T AAT AAAGGAAAG C AAC AT GAT AC AG T AAAGT AT AT GGAT AC AG 
AAT T T C AGT T T GGT AT C AC AG AT GG T CAT GAGAT T GT GGAT T T AGAC C C T 
AAAT T AAAG C AAAT AAAT C T GAT T AAC T T T AC AG AT GAG AT GAG C AAAT T 

TGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAATA 
ATTGTTTAGGTGTGTACGAAtATAaaGCATC 

SEQ ID NO: 4906 
STRAIN M732 

AAAC AT C C GAT AC T T AAT GAT C AAAAAT C C T TAG C AAT T GT T G AAC A 
GAT AGAAT AT GAT T T GGAT AAAT T CG AT AAT T C AGAAG CTTCTTTTT AT G 
C AAC AT TAG CT AGAAT T C G C G T TAT GGAT AGAG AAAT C AAAAAAT T TAT T 
AGAG AAAAT C C AAAT AGT C AAAT C CT T T C AAT T GGT T G T GG AC T T GAT AC 
AAG GT T T G AAAG AGT C GAT AAT GG AC AAAT T AG GT GG T AT AAC CT T GAT T 
T G C C AG AGGT TAT GG AG AT AAG AAAAT TAT T T T T T GAAG AG CAT G AAAG A 
GT T AC T AAT AT AG C AAAAT C AG C C CT AG AT GAAAC T T GG AC AC G GGAGGT 

AAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGGTGTTTTAATGT 
T T C T AAAAg AAG AT G AC G TAG AGAC T TT T C T T C At AT C C T GAC AAAT T C A 

TTTAGCCAATTTATGGCaCAATTTGATTTGTGTCATAAGGAAATGATTAA 
T AAAG G AAAG C AAC AT GAT AC AG T AAAGT AT AT G GAT AC AGAAT T T C AGT 
T T G GT AT C AC AG AT GGT CAT GAGAT T GT GG AT T TAG AC C C T AAAT T AAAG 
C AAAT AAAT C T GAT T AAC T T T AC AG AT GAGAT GAG C AAAT T T GAG T T AgG 

CACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAATAATTGTTTAG 
G t GTGTACGAATAT AAAGC AT C 

SEQ ID NO: 4907 
STRAIN COH1 

AAAC AT C C GAT ACT T AAT GAT C AAAAAT C CT T AG C AA 

T T G T T G AAC AG AT AG AAT AT GAT T T G GAT AAAT T C GAT AAT T C AGAAG C T 

TCTTTTTATGCAAC ATT AGCTAGAATTCGCGTTATGGAT AGAGAAAT CAA 
AAAAT T TAT TAG AG AAAAT C C AAAT AGT C AAAT C C T T T C AAT TGGTTGTG 
G AC T T GAT AC AAGGT T T G AAAG AGT C GAT AAT GG AC AAAT T AGG T G G T AT 
AAC CT T GAT T T G C C AG AGGT TAT G G AG AT AAGAAAAT TAT T T T T T GAAG A 
G C AT G AAAG AGT T AC T AAT AT AGC AAAAT C AG C C C TAG AT G AAACT T GG A 
CACGGGAGGTAAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGGT 
GTTTTAATGTTTCTAAAAGAAGATGACGTAGAGACTTTTCTTCATATCCT 
GAC AAAT T C AT TT AG C C AAT T TAT GG C AC AAT T T GAT T T G T GT C AT AAGG 
AAAT GAT T AAT AAAGGAAAG CAACATGATACAGT AAAGT AT AT GGAT AC A 
GAAT T T C AGT T T G GT AT C AC AGAT GGT CAT GAGAT T G T G GAT T TAG AC C C 
T AAAT T AAAG C AAAT AAAT C T G AT T AAC T T T AC AG AT GAGAT G AGC AAAT 

TTGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAAT 
AAT T G T T T AGG T G T G T AC GAAT AT AAAG CAT C 

SEQ ID NO: 4908 
STRAIN M781 

AAAC AT C C GAT AC T T AAT G AT C A 
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AAAAT C C T TAG C AAT T GT T GAAC AG AT AG AAT AT GAT T T GGAT AAAT T C G 
AT AAT T C AG AAG CTTCTTTT TAT G C AAC AT TAG CT AG AAT T C G C GT TAT G 
GATAGAGAAAT CAAAAAATTT AT T AGAGAAAAT CCAAATAGT CAAATCCT 
TTCAATTGGTTGTGGACTTGATACAAGGTTTGAAAGAGTCGATAATGGAC 
AAAT TAG GT GGT AT AAC C T T GAT T T GC C AG AGGT TAT G GAGAT AAGAAAA 
T TAT T T T T T G AAG AG CAT G AAAG AGT T AC T AAT AT AG C AAAAT C AG C C C T 

AG ATGAAACTTGG AC ACGGG AGGT AAAT CCCC AAAAT GCCCCTTTTCTAA 
T CGT GT C AG AAGGT GT T T T AAT G T T T C T AAAAg AAGAT G ACGT AG AG AC T 
T T T C T T CAT AT C C T G AC AAAT t CAT T T AGC C AAT T T A t G G C AC AAT T T G A 

TTTGTGTCATAAGGAAATGATTAATAAAGGAAAGCAACATGATACAGTAA 
AGT AT AT G GAT AC AG AAT T T C AGT T T GGT AT C ACAG AT GGT CAT GAGAT T 
GT GGAT T T Ag AC C C T AAAT T AAAG C AAAT AAAT C T GAT T AACT T T AC AG A 
TGAGATGAGCAAATTTGAGTTAGGCACACTTCGCTCTTTACTTCCAACAA 
TTCGTAAATTTAATAATtGTTTAGGTGTGTACGAATATAAAGCATC 

SEQ ID NO: 4909 
STRAIN CJB110 

AAAC AT C CG AT AC T T AAT G AT C AAAAAT C C T TAG CAA 

T T GT T GAAC AG AT AG AAT AT GATT T T GAT AAAT T C GAT AAT T C AG AAG C T 

T C T T T T T AT GC AAC AT T AG CT AGAAT T CG C GT T AT GGAT AG AG AAAT CAA 

AAAATTTATTAGAGAAAATCCAAATAGTCAAATCCTTTCAATTGGTTGTG 
GAC T T G AT AC AAGG T T T G AAAG AGT C GAT AAT G G AC AAAT T AGG T G GT AT 
AAC C T T GAT T T G C C AG AGGTT AT G GAGAT AAG AAAAT TAT T T T T T G AAGA 
GCAT G AAAGAGT T AC T AAT AT AG C AAAAT C AG C CAT AG AT G AAAC T T GG A 
CACGGGAGGT AAAT CCCC AAAATGCCCCTTTTCTAATCGTGTCAGAAGGT 
GTT T T AAT G TT TCT AAAAG AAG AT G ACGT AGAG ACT TTTCTTC AT ATCCT 
GAC AAAT T CAT T T AG C C AAT T T AT G G C AC AAT T T GAT T T GT GT C AT AAG G 
AAAT GAT T AAT AAAGG AAAG C AAC AT GAT AC AGT AAAG TAT AT GGAT AC A 
G AAT T T C AGT T T GGT AT C AC AGAT GGT CAT GAGAT T GT GGAT T TAG AC C C 
T AAAT T AAAG C AAAT AAAT C T G AT T AACT T T AC AG AT GAGAT GAG C AAAT 
T T GAG T T AGG C AC AC TTCGCTCTT T AC T T C C AAC AAT T C GT AAAT T T AAT 
AAT T GT T T AGG T G T GT AC G AAT AT AAAG CAT C 

SEQ ID NO: 4910 
STRAIN 1169NT 

AAAC AT C C GAT AC T T AAT GAT C AAAAAT C C T T AG C AAT 

T GT T GAAC AGAT AGAAT AT GAT T T T GAT AAAT T C GAT AAT T C AG AAG CT T 

CT T T T TAT GC AAC AT T AG CT AGAAT T C G C G T TAT G GATAGAGAAAT C AAA 

AAATTT ATT AGAGAAAAT CCAAATAGT CAT AT CCTT TCT ATT GGTTGTGG 
AC T T GAT AC AAGGT T T G AAAG AGT C GAT AAT GGAC AAAT T AG GT GGT AT A 
AC CT T GAT T T G C C AGAGG TT AT GG AGAT AAG AAAAT TAT T T T T T G AAG AG 
CAT G AAAG AGT TACT AAT AT AG C AAAAT C AG C C C T AGAT G AAAC T T GGAC 
ACAGGAGGTAAATCCCCAAAATGCCCCTTTTCTGATCGTGTCAGAAGGTG 
TTTTAATGTTTCTAAAAGAAGATGACGTAGAGACTTTTcTTCATATCCTG 
AC AAAT T CAT T TAG C C AAT T T AT G G C AC AAT T T GAT T T G T G t C AG AAGG A 
AAT GAT T AAT AAAG G AAAG C AAC AT G AT AC AGT AAAGT AT AT G GAT AC AG 
AAT T T C AGT T T G GT AT C AC AG AT GGT CAT GAAAT T GT G GAT T T AGAC C C T 
AAAT T AAAG C AAAT AAAT C T GAT T AAC T T TAG AG AT GAGAT GAG C AAAT T 

TGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAATA 
AT T GT T TAG GTGTGTAC GAAT AT AAAGC AT C 

SEQ ID NO: 4911 
STRAIN JM9130013 

AG C AAT T GTT GAAC AG AT AG AAT AT GATT 

T T GAT AAAT T C GAT AAT T C AG AAG CTTCTTTT T ATG C AAC AT TAG C TAG A 
AT T CG C GT T AT G GATAGAGAAAT C AAAAAAT T TAT TAG AG AAAAT C C AAA 

TAGTCATATCCTTTCAATTGGCTGTGGACTTGATACAAGGTTTGAAAGAG 
T CGAT AAT G GAC AAAT T AGGT G G T AT AACC T T GAT T T G C C AG AG GT TAT G 
GAG AT AAG AAAAT TAT T T T T T G AAG AG CAT G AAAG AGT T AC T AAT AT AGC 
AAAAT C AGC CCT AG ATGAAACTTGG AC ACGGG AGGT AAAT CCCC AAAAT G 
CCCCTTTTCTAATCGTGTCAGAAGGTGTTTTAATGTTTCTAAAAGAAGAT 
G AC GT AG AG AC T T T T C T T CAT AT C CT GAC AAAT T CAT T T AG C C AAT T TAT 
G G C AC AAT T T GAT T T GT GT CAg AAGG AAAT GAT T AAT AAAGG AAAG C AAC 
AT GAT AC AGT AAAGT AT AT GGAT AC AG AAT T T C AGT T T G G TAT C AC AG AT 
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GGT C AT GAAAT T GT GGAT T T AGAC C C T AAAT T AAAG C AAAT AAAT C T GAT 
T AACT T T AC AGAT GAG AT GAG C AAAT T T G AGT T AGGC AC ACT TCGCTCTT 
T AC T T C C AAC AAT T CGT AAAT T T AAT AAT T GT T T AGGT GT GT ACGAAT AT 
AAAG CATC 

SEQ ID NO: 4912 

STRAIN 2 603 frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARXRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4913 

STRAIN 0 90 frame: 2 

NDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSIGCGLD 
TRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSAIDETWTREVNPQNAPFLI 
VSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTEFQFGI 
TDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4914 

STRAIN A909 frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQI LSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4915 

STRAIN H3 6B frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSHILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCQKEMINKGKQHDTVKYMDTE 
FQLGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 



SEQ ID NO: 4916 

STRAIN 18RS21 frame: 3 

HPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSIG 
CGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQNA 
PFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTEF 
QFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4917 

STRAIN M732 frame: 1 

KHPILNDQKSLAIVEQIEYDLDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4918 

STRAIN COH1 frame: 1 

KHPILNDQKSLAIVEQIEYDLDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4919 

STRAIN M7 81 frame: 1 

KHPILNDQKSLAIVEQIEYDLDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4920 

STRAIN CJB110 frame: 1 
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KHPILNDQKSLAIVEQIEYDFDKFDNSEASFYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSAIDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4921 

STRAIN 1169NT frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSHILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTQEVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCQKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4922 

STRAIN JM9130013 frame: 2 v 

AIVEQIEYDFDKFDNSEASFYATLARIRVMDREIKKFIRENPNSHILSIGCGLDTRFERV 
DNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQNAPFLIVSEGVL 
MFLKE DDVET FLHI LTN S FS QFMAQFDLCQKEMINKGKQHDT VKYMDTE FQFGIT DGHE I 
VDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO. 5001 
STRAIN 2603 

AT G AAAAAAC AAAAAC TAT TAG T G C T T AT T GGAG G C T TAT T AAT AAT GAT AAT GAT GAC A 
GC AT GT AAG GAT T C AAAAAT CC C AG AAAAC CG C AC AAAGGAAG AG T AC C AAG C T G AAC AA 
AAT T T T AAAC CGT T T T T T G AGT T T T T AG C AC AAAAAGAT AAAG AT T T GAGC AAAAT ACAA 
AAAT AC T TAG TAT TAG TAT C GG AT T C AGG T GAT G CAT T AGAT T TAG AAT AT T T C T AT AGT 
AT T C AAGAT T T AAAAAAAAAT AAGGAT T T AG GG AAGT T T G AAAC AAGAAAAAGT C AAAT A 
G AAAAG CCGGGTGGC TAT AAT GAG T T AGAAAAT AAAG AG GT C C CAT TT G AAT AT T T T AAA 
AAT AAT AT AGT T TAT C C AAAAGGAAAAC C GAAT AT T AC AT T T GAT GAC T T TAT TAT C G G A 
G CAAT G GAT AC T AAAGAAT T AAAAG AAT T AAAAAAAT T AAAAG T AAAAAGT TAT T T AT T A 
AAAC AT C C G G AAAC T G AGT T G AAAGAT AT AAC AT AT GAAT T G C CG AC AC AG T C G AAG CT T 
ATTAAAAAA 

SEQ ID NO. 5002 

STRAIN 090 

T AAGGAT T C AAAAAT C C C AGAAAAC C G C AC AAAG 

GAAG AGT AC C AAG C T G AAC AAAAT T T T AAAC TGTTTTTT GAGT T T T T AGC 
AC AAAAAT AT AAAGAT TT G AAC AAAAT AC AAAAAT AC T T AC TAT TAG TAT 
C GGAT T C AG GT G AT G CAT TAG AT T TAG AAT AT T T CT AT AGT AT T C AAG AT 
T T AAAAAAAAAT AAGGAT T T AGGG AAGT T T G AAAC AAG AAAAAGT C AAAT 
AGAAAAG C CG G G T GG CT AT AAT GAGT TAG AAAAT AAAGAG GT C C CAT T T G 
AAT AT T T T AAAAAT AAT AT AGT T TAT C C AAAAG G AAAAC C GAAT AT T AC A 
T T T GAT GAC T T TAT TAT C G GAG CAAT G GAT AC T AAAGAAT T AAAAAAAT T 
AAAAGT AAAAAGT TAT T TAT T AAAAC AT C C GG AAAC T GAG T T GAAAG AT A 
T AAC AT AT GAAT T G C C GAC AC AG T C G AAGCT T AT T AAAAAA 

SEQ ID NO. 5003 

STRAIN 18RS21 

T AAG GAT T C AAAAAT C C C AG AAAAC C G C AC AAAG GAAG 

AGTACCAAGCTGAACAAAATTTTAAACCGTTTTTTGAGTTTTTAGCACAA 
AAAG AT AAAGAT T T G AG C AAAAT AC AAAAAT AC T TAG TAT T AGT AT C GG A 

TTCAGGTGATGCATTAGATTTAGAATATTTCTATAGTATTCAAGATTTAA 
AAAAAAAT AAG GAT T T AGG G AAGT T T GAAAC AAGAAAAAGT C AAAT AG AA 
AAG CCGGGTGGC TAT AAT G AGT T AGAAAAT AAAG AGG T C C C AT T T GAAT A 

TTTTAAAAATAATATAGTTTATCCAAAAGGAAAACCGAATATTACATTTG 
AT GAC T T TAT TAT C G GAG CAAT G GAT AC T AAAG AAT T AAAAG AAT TAAAA 
GAAT T AAAAAAAT T AAAAGT AAAAAGT TAT T TAT T AAAAC AT C C GG AAAC 
T G AGT T GAAAG AT AT AAC AT AT GAAT T GC CG G C AC AG T CG AAG CT T AT T A 
AAAAA 

SEQ ID NO. 5004 

STRAIN 2 603 frame: 1 

MKKQKLLLLIGGLLIMIMMTACKDSKIPENRTKEEYQAEQNFKPFFEFLAQKDKDLSKIQ 

KYLLLVSDSGDALDLEYFYSIQDLKPCNKDLGKFETRKSQIEKPGGYNELENKEVPFEYFK 

NNIVYPKGKPNITFDDFIIGAMDTKELKELKKLKVKSYLLKHPETELKDITYELPTQSKL 
IKK 
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SEQ ID NO. 5005 

STRAIN 090 frame: 2 

KDSKI PENRTKEEYQAEQNFKLFFEFLAQKYKDLNKIQKYLLLVS DSGDALDLEYFYS IQ 

DLKKNKDLGKFETRKSQIEKPGGYNELENKEVPFEYFKNNIVYPKGKPNITFDDFIIGAM 
DTKELKKLKVKSYLLKHPETELKDITYELPTQSKLIKK 

SEQ ID NO. 5006 

STRAIN 18RS21 frame: 2 

KDSKI PENRTKEEYQAEQNFKPFFE FLAQKDKDLSKIQKYLLLVS DSGDALDLEYFYSI Q 

DLKKNKDLGKFETRKSQIEKPGGYNELENKEVPFEYFKNNIVYPKGKPNITFDDFIIGAM 
D TKE LKE LKE LKKLKVK S YLL KH PETE LKD I T YE L PAQS KL I KK 

SEQ ID NO. 5101 
STRAIN 2603 

ttgaataataaaggtgtcggtggcgatggtgtccaaatttatcaatacta 
tatcaaaatggacaacaataaaccttacttaagtcccaaagataagacta 
ctgtagagaagttagaagatcgctggaaaaaaattactttcaaagttcag 
gatactggcattggtttgaaagacgtttatcttcaatctgttaagtatgt 
tggtggtggcaataataatttagaccttatcacacctccaggatttaaaa 
aagaagataaaaaagttgaaaaaccaaaattagaccgtccaccaggaatt 
gatttaccagcaccaacttcaatgagaagttttgattattcaaccccacc 
gggaactaagccaagcaaacccaaagatagtttatcaactcctccaggtt 
tcccagatttaaacacgccgccggatgaagcaccaaaggatagtaaaaaa 
gacgctattgaagataaatcaggagcaattaaatatgctaagtctcttca 
acttagctttgttgatggccctattttagctagcaaagtaaatggcaaaa 
tattacaagtcgaatctgatggcaaattagtcattcctagaaatgctttg 
tcagctaatcaatttgatgacactagtcttaaaatttatcgtaataataa 
tcgcaataaagaaattactatcacaacagattattttgcagatacaaaat 
atgtcaatatcacagcggttgactatttgagcaatactacttttgagcaa 
ttagctactggtgaaacagtagattaccatgccattgtattttcaagctt 
tgctgctattaaagacaagggtggtaagatttatgttaacgataaattgc 
aagaaacttctcgtatagcgcttaaagataaatctgttaagattggtatt 
gaattaccaaatgatgtcagacatattgatagtttatctgttcgtcgttt 
gaatgaggttaaaactgttgataatatcttgaaaaatgatgaacaagaca 
ttaatctcagcaaaacttaccaattaaaatacaacccgacaaatcgtcgt 
ctagagtttactattaataacattaactcaagttcagaaatcatgaccac 
tttcaaagatggaaagatgccagaattggttgaacaaaaagatgtttctt 
tggatataaacgatatggacatgagtaagtttaaaactattcgacttgga 
cgaaaggattctgaatttaagggacaacttattgcaaaaactggaacagt 
tgaattagatatgtttttcaaacaatctcaagacccagcttcaattatta 
aaaaaatataccttatccaaaatggtgttccaaatgaattgaaaaaattt 
gactctagttttggtttaactgaaagtcagatagatggatactatattta 
taaagatgcaattaaccttaaatttaaattaaccagtggtgcaagtctta 
aagttgtttataaagggcaagaagatccatatagtcatcagaaagaagat 
atgactaaaaaaggtgaacagctcagtcattcaactcaagccaatgaaaa 
tacagcaaaagtaacctttgctaatattgactggtcacattatagtaagg 
ttactgtgaatggaaaagaagttgttaaaggtagtgagttacctttaact 
aaaggatggacaacatttgtattacataaaacagaaaattcattaaatgt 
taaaagtttgattatggagacgggtagtgtaagtaagaaagttcaacaac 
ttcctttaagtcctagattatctaaaaataagcatatgagggatatgcta 
cttactatgcaaaaagattcagcgtattacgaaacaagtgacagtctagt 
ccttcgaattaatctcactgcagatactaaacttaattttaatgctgtta 
aaggagcgagtgctcttactgaaaatatgatgatgagacagtttgcagtt 
gctggaccacaagatgatcctgttagtgaacataaatacccatcagtatt 
tctcttaactcctgccttattggaaactgctagtgaggcaactctaaatg 
gtaaggaaatcacagcatctggtattatcggtcacatcaaggatggtgat 
aaaagcaagcatgttgaagtcaaaatggtgaatgaaaatggagacatgct 
aggaacccctgttattattcaaggtaaagacttgactaatcgaacaaaac 
< cattaatgagtggacgtagagtactttatgccggtaaacaatatgagttc 
cgggctaaattaccacttagtcgttttaacacttggattagggttgaagt 
ggtaacagaagcaggagagaaagcaagtattgttcgtcgcatgttctttg 
accaatcagttccagagcttaacacagcagttgctaaacgtgatttgact 
tctgatactgctcttatccacatcgttgccaaagatgactctctaaaact 
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aaaattatatcaagatgattcattacttgaatctgttgataaaaccggtc 

tttatagttttagaaatggtgtagaaatcactaaagatatgacagtacca 

ctagaatttggagataatattattaagttatctgctgttgacttatcaaa 

ttatcgtcgtaatgagacccttcatatctatagaaaccgttttgatgtta 

aagcaagccaaatgacagctgacaaaggagctaaagtaactgtggatatg 

ttgatgaagcacttagttgttccagaaatggcaggagcttatacattaac 

aatcgacgaagctccaaacacaaatgaatcaggaatgttaacaaacgcta 

aagtatcgattcattatgtaaatggtggtgttgataaagttgatgttccg 

attaaagtagttgacttagaagctattcgtaaagctgaagaagcacgtaa 

agctgaagaagcacgtaaagctgaagaagcacgtaaagctgaagagggac 

ataaaacccaagaagcacctatagttgaagaaggctacaaggttaataac 

gttcatcaaactgatactacagttaaagcgtctgatttaccaaagactaa 

gacagtttccgcagttcatatggctagaacagacaataaacagataactt 

cacatcagacacatgttgaaaaacaaattaaaaatacattgccatccact 

ggtgacagcaaacgtggttattatatcactggaatggctatcgttatgct 

gagtgtattatttagtttagctaaaaagtttaaaagcaaatat 

SEQ ID NO. 5102 
STRAIN A909 

TTGAATAATAAAGGTGTCGGTGGCGAT 

GGTGTCCAAATTTATCAATACTATATCAAAATGGACAACAATAAACCTTA 
C T T AAGT C C C AAAG AT AAGACT ACT GT AG AGAAGT T AG AAG AT CG C T GG A 
AAAAAAT T AC T T T C AAAGT T C AGGAT ACT G G CAT T GGT T T G AAAG AC GT T 

TATCTTCAATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCT 
TAT C AC AC CT C C AGGAT T T AAAAAAG AAGAT AAAAAAG T T G AAAAAC C AA 
AAT T AGAC C G T C C AC C AG GAAT T GAT T T AC C a C C AC C AACT T C AAT GAGA 
AG T T T T GAT TAT T C AAC C C C AC C G G G AACT AAG C C AAG C AAAC C C AAAG A 

TAGTTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGATG 
AAG C AC T AAAG G AT AGT AAAAAAG AC G CT AT T G AAG AT AAAT C AGG AG C A 

ATTAAATATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTTT 
AG CT AG C AAAGT AAAT GG C AAAAT AT T AC AAGT C GAAT C T GAT GG C AAAT 
T AGT CAT T C C TAG AAAT GC T T T GT C AGC T AAT C AAT T T GAT G AC AC TAG T 
CTT AAAAT TT AT CGT AAT AATAAT CG CAAT AAAG AAAT TACT AT C ACAAC 
AG AT TAT T T T G C AGAT AC AAAAT AT GT C AAT AT C AC AG C G GT T G AC TAT T 
TGAGCAATACTACTTTTGAGCAATTAGCTACTGGTGAAACAGTAGATTAC 
CATGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAA 
GAT T T AT GT T AAC GAT AAAT T G C AAGAAAC T T C T C GT AT AG C G CT T AAAG 
AT AAAT C T G T T AAGAT T G G T AT T GAAT T AC C AAAT GAT G T C AG AC AT AT T 

GATAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATAT 
CTT GAAAAAT GAT GAAC AAG AC AT T AAT C T C AG C AAAAC T T AC CAAT T AA 

AAT AC AACCC G AC AAAT CGT CGT CT AG AGT T TACT ATT AAT AAC ATT AAC 
T C AAGT T C AGAAAT CAT G AC C AC T T T C AAAG AT G G AAAG AT GC C AG AAT T 
GG T T G Aa C AAAAAG AT GT T T C T T T GG AT AT Aa a C G AT AT GGAC AT G AGT A 
AGT T T AAAAC TAT T CG AC T T G G AC GAAAG GAT T C T GAAT T T AAG GG AC AA 
C T TAT T G C AAAAAC T GG AAC AGT T GAAT T AGAT AT G T T T T T C AAAC AAT C 
T C AAGAC C C AG CTT CAAT TAT T AAAAAAAT AT AC C T TAT C C AAAAT GG T G 

TTCCAAATGAATTGAAAAAATTTG ACT CTAGTTTT GGT TT AACT G AAAGT 
C AG AT AGAT GG AT AC TAT AT T TAT AAAG AT G CAAT T AAC C T T AAAT T T AA 
AT T AAC C AGT GGT G C AAG T C T T AAAG T T GT T TAT AAAG G G C AAG AAGAT C 
CAT AT AGT CAT C AG AAAG AAGAT AT G AC T AAAAAAG G T GAAC AG CT C AGT 
CAT T C AAC T C AAG C CAAT G AAAAT AC AG C AAAAGT AAC CT T T G C T AAT AT 
T G AC T GGT C AC AT T AT AG T AAGGT T AC T G T GAAT GGAAAAG AAG T T G GT A 

AAGGTAGTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATTACAT 
AAAAC AGAAAAT T CAT T AAAT G T T AAAAGT T T GAT TAT GG AG AC GG GT AG 

TGTAAGTAAGAAAGTTCAACAACTTCCTTTAAGTCCTAGATTATCTAAAA 
AT AAG CAT AT G AG GG AT AT G C T AC T T AC TAT G C AAAAAG AT T C AG C GT AT 
T AC G Aa a C AAGT G AC AG T C T AGT C C T T CGAAT T AAT CT C ACT G C AG AT AC 

TAAACTTAATTTTAATGCTGTTAAAGGAGCGAGTGCTCTTACTGAAAATA 
T GAT GAT G AGAC AGT T T G C AGT T G C T GGAC C AC AAG AT GAT C CT G T T AGT 

GAACATAAATACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAAC 
TGCTAGTGAGGCAACTCTaAATGGTAAGGAAATCACAGCATCTGGTATTA 
T CG GT C AC AT C AAGG AT GGT G AT AAAAG C AAG C AT GT T G AAGT C AAAAT G 
GT G AAT G AAAAT GG AG AC AT G CT AG GAAC C C C T G T TAT TAT T C AAG GT AA 
AG AC T T GACT AAT C GAAC AAAAC CAT T AAT GAG T GG AC G TAG AGT AC T T T 
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ATGCCGGTAAACAATATGAGTTCCGGGCTAAATTACCACTTAGTCGTTTT 
AAC AC T T G GAT TAG G GT T G AAGT GGT AACAGAAG C AG GAG AGAAAG C AAG 

TATTGTTCGTCGCATGTTCTTTGACCAATCAGtTCCAGAGCTTAACACAG 
CAGTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTT 
G C C AAAG AT G AC T CT C T AAAAC T AAAATT AT AT C AAG AT GAT T CAT T AC T 
T GAAT C T GT T G AT AAAAC CGGT CT T T AT AGT T T T AGAAAT GG T GT AGAAA 
T CACTAAAGAT AT GACAGTACCACT AGAAT TT GGAGAT AAT ATT ATTAAG 
T TAT CTGCTGTT G AC T TAT C AAAT TAT CGT CG T AAT GAG AC C C T T CAT AT 
CT AT AG AAAC CGT T T T GAT GT T AAAG C AAGC C AAAT GAC AG C TG AC AAAG 

GAGCTAAAGTAACTGTGGATATGTTGATGAAGCACTTAGTTGTTCCAGAA 
AT GG C AG GAGC T TAT AC AT T AAC AAT C GAC G AAG AT C C AAAC AC AAAT GA 
AT C AGGAAT GT T AAC AAACGCT AAAGT AT CG ATT CAT TAT GT AAAT GGT G 
GT GT T GAT AAAGT T GAT GT T C CG AT T AAAGT AGT T GAC T T AG AAG CT AT T 
CGT AAAG C T G AAGAAG C AC AT AAAG C T G ACG AAG C AC G T AAAGCT G AAG A 
AG C AC GT AAAG CT GAAG AAG C AC G T AAAG CT G AAG AAG C AC GT AAAG CT G 

AAGAGGGACATaAAACCCAAGAAGCACCTATAGTTGAAGAAGGCTACAAG 
G T T AAT AAC GT T CAT C AAAC T GAT ACT AC AGT T AAAG C GT C T GAT T T AC C 
AAAGAC T AAG AC AGT T T C C G C AGT T CAT AT GGC TAGAACAGAC AAT AAAC 
AG AT AAC T T C AC AT C AG AC AC AT G T T G AAAAAC AAAT T AAAAAT A 

SEQ ID NO. 5103 
STRAIN H36B 

T G GT GT C C AAAT T TAT C AAT AC TAT AT C AAAAT G GAC AAC AAT AAACC T T 

ACTTAAGTCCCAAAGATAAGACTACTGTAGAGAAGTTAGaaGATCGCTGG 
AAAAAAAT T AC T T T C AAAGT T C AGG AT ACT GG CAT T G GT T T G AAAG AC G T 
T TAT C T T C AAT C T GT T AAGT AT GT TGGTGGTGG C AAT AAT AAT T TAG AC C 
T TAT C AC AC CT C C AGGAT T T AAAAAAG AAG AT AAAAAAG T T G AAAAAC C A 
AAAT TAG AC C GT C C AC C AGG AAT T GAT T T AC C AG C ACC AAC T T C AAT GAG 
AAGT T T T GAT TAT T C AAC C C C AC CGGG AACT AAG C C AAG C AAAC C CAAAG 

ATAGTTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGAT 
GAAG C ACT AAAG GAT AG T AAAAAAGACG CT AT T GAAG AT AAAT C AGGAGC 
AAT T AAAT AT G C T AAGT CT CT T C AAC T T AGC T T T GT T GAT GAC C C T AT T T 
TAG C TAG C AAAG T AAAT GGC AAAAT AT T AC AAGT C GAAT C T GAT GG C AAA 
T T AG T CAT T C C T AGAAAT G CT T T GT C AG C T AAT C AAT T T GAT GAC AC TAG 
T C T T AAAAT T TAT C GT AAT AAT AAT CG C AAT AAAGAAAT T a c TAT C AC AA 
C AG AT TAT T T T G C AG AT AC AAAAT AT GT C AAT AT C AC AG C G G T T GAC TAT 
TTGAGCAATACTACTTTTGAGCAATTAGCTACTGGTGAAaCAGTAGATTA 
CCATGCCATTGTAtTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTA 
AGATTTATGTCAACGATAAATTGCAAGAAACTTCTCGTATAGCGCTTAAA 
GAT AAAT C T GT T AAGAT T G GT AT T GAAT T AC C AAAT GAT G T C AG AC AT AT 

TGATAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATA 
TCTTGAAAAATGATGAACAAGACATTAATCTCAGCAAAACTTACCAATTA 
AAAT AC AAC C C GAC AAAT CGT CGT C T AG AGTT T AC TAT T AAT AAC AT T AA 
C T C AAG T T C AG AAAT CAT GAC C ACT T T CAAAG AT G G AAAG AT G C C Ag AAT 
T G GT T G AAC AAAAAG AT GTTTCTTTG GAT AT AAAC GAT AT GGAC AT GAGT 
AAG T T T AAAAC TAT T C G AC T T GGAC G AAAGG AT T C T GAAT T T AAG G GAC A 
ACT TAT T G C AAAAACT GG AAC AGT T GAAT T AG AT AT GT TT T T C AAAC AAT 
C T C AAGAC C C AG C T T C AAT TAT T AAAAAAAT AT AC C T TAT C C AAAAT GGT 
GTTCCAAATGAATTGAAAAAATTTG ACT CT AGTT TTGGTTTAACTG AAAG 
TCAGATAGATGGATACTATATTTATAAAGATGCAATTAACCTTAAATTTA 
AAT T AAC C AG T G G T G C AAG T C T T AAAGT T GT T TAT AAAGGG C AAG AAGAT 
C CAT AT AG t CAT C AG AAAGAAGAT AT G ACT AAAAAAG GT G AAC AG CT C AG 
T CAT T C AAC T C AAG C C AAT G AAAAT AC AGC AAAAGT AAC C T T T G C T AAT A 
TTGACTGGTCACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGGT 
AAAG GT AG T GAG T T AC C T T T AACT AAAGG AT G GAC AAC AT T T G TAT T AC A 
T AAAAC AGAAAAT T CAT T AAAT GT T AAAAG T T T GAT TAT G G AGACGG G T A 
G T G T AAGT AAGAAAG T T C AAC AAC T T C CT TT AAGT C C TAG AT TAT C T AAA 
AAT AAG CAT AT G AGGGAT AT G C T AC T TAG TAT G C AAAAAGAT TCAGCGTA 
T T AC G AAAC AAG T GAC AGT C T AGT C C TT C G AAT T AAT C T C AC T G C AG AT A 

C TAAACTTAATTTTAATGCTGTT AAAGG AG CGAGTGCTCTT ACT G AAAAT 
AT GAT GAT G AGAC AG T T T G C AGT TG C T GGAC C AC AAG AT GAT CCT GT T AG 

TGAACATAAATACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAA 
C T G CT AG T GAG G C a AC T C T AAAT GGT AAGGAAAT C AC AG CAT CT GGT AT T 
AT CGGT C AC AT C AAGG AT GG t GAT AAAAG C AAGC AT GT T GAAGT C AAAAT 



202 



WO 2004/018646 



SEQUENCE LISTING 



GGTGAATGAAAATGGAGACATGCTAGGAACCCCTGTTATTATTCAAGGTA 
AAGACT T GACT AAT C G AAC AAAAC C AT T AAT G AGT GGACGT AG AGT AC T T 

TATGCCGGTAAACAATATGAGTTCCGGGCTAAATTACCACTTAGTCGTTT 
T AAC a CT T G GAT TAG G GT T GAAGT GGT AAC AG AAGC AGG AG AG AAAGC AA 
GT AT TGTTCGTCG C AT GT T CT T T GAC C AAT C AGT T C C AGAG C T T AAC AC A 

GCAGTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGT 
T GCC AAAGAT GAC T CT CT AAAACT AAAAT TAT AT C AAG AT G ATT CAT TAG 
T T G AAT CT G T T GAT AAAAC C GGT CT T TAT AGT T T T AGAAAT G GT GT AG AA 
ATCACTAAAGATATGACAGTACCACTAGAATTTGGAGATAATATTACTAA 
GTTATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTCATA 
T CT AT AG AAAC C GT T T T G AT GT T AAAGC AAG C C AAAT GAC AG C T GACAAA 
G G AGCT AAAGT AAC T GT GG AT AT GT T GAT G AAG C AC T T AGT T GT T C CAG A 
AAT GG C AGGAG C T TAT AC AT T AAC AAT C GAC GAAG C T C C AAAC AC AAAT G 
AAT C AGG AAT G T T AAC AAACG C T AAAGT AT CG AT T CAT T AT GT AAAT GG T 
GGT GT TGAT AAAG 1 1 GAT GT T C CGAT T AAAGT AGT T GACT T AGAAG C T AT 

TCGTAAAGCTGAAGAAGCACATAAAGCTGACGAAGCACGTAAAGCTGAAG 
AAGC ACGT AAAG CT GAC GAAG C AC AT AAAG C T G AAG AAGT AC GT AAAGCT 
GAAG AAG C AC AT AAAGT CGAAG AAG C AC GT AAAG CT GAAG AG G GAC AT AA 
AAC C C AAGAAG C AC CT AT AGT T GAAGAAGGC T AC AAGGT T AAT AAC GT T C 
AT C AAAC T GAT ACT AC AGT T AAAGC GT C T GAT T TAG CAAAGACT AAGACA 

GTTTCCGCAGTTCATATGGCTAGAACAGACAATAAACAGATAACTTCACA 
T CAG AC AC AT G 

SEQ ID NO. 5104 
STRAIN 18RS21 

TTGAATAATAAAGGTGTCGGTGGCGATGGTGTCCAA 

AT T T AT C AAT AC TAT AT C AAAAT G GAC AAC AAT AAAC CT T AC T T AAGT C C 
C AAAG AT AAGAC T AC T G TAG AG AAGT TAG AAG AT C G CT G G AAAAAAAT T A 
CTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTTTATCTTCAA 
TCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCTTATCACACC 
T C CAG GAT T T AAAAAAGAAG AT AAAAAAGT T G AAAAAC C AAAAT TAGAC C 
GT C C AC C AGG AAT T GAT T T AC CAG C AC C AAC T T C AAT GAG AAGT T T T GAT 
TATTCAACCCCACCGGGAACTAAGCCAAGCAAACCCAAAGATAGTTTATC 
AACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGaTGAAGCACCAA 
AG GAT AGT AAAAAAG AC G C T AT T GAAG AT AAAT C AGGAG C AAT T AAAT AT 

GCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTTTAGCTAGCAA 
AGTAAATGGCAAAATATTACAAGTCGAATCTGATGGCAAATTAGTCATTC 
CT AG AAAT G C T T T GT CAG C T AAT C AAT T T GAT GAC AC T AGT C T T AAAAT T 
TAT C G T AAT AAT AAT CG C AAT AAAGAAAT T ACT AT C AC AAC AG AT TAT T T 
T G C AGAT AC AAAAT AT GT C AAT AT C AC AGC GGT T GAC TAT T T GAG C AAT A 
C T AC T T T T GAG C AATT AG C TACT GGT G AAAC AGT AG AT TAG CAT G C C ATT 
GT AT T T T C AAG CT T T GC T GCT AT T AAAG AC AAG GGT GGT AAG AT T T AT GT 
T AAC GAT AAATTGCAAGAaACTTCTCGT AT AGCGCTT AAAGAT AAAT CTG 
T T AAG AT T GG TAT T G AAT T AC C AAAT G AT GT CAG AC AT AT T GAT AGT T T A 

TCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCTTGAAAAA 
T GAT G AAC AAG AC AT T AAT C T CAG C AAa ACT T AC C AAT T AAAAT AC AAC C 
CG AC AAAT CG T C GT CT AG AG T T T AC TAT T AAT AAC AT T AAC T C AAGT T C A 
G AAAT CAT GAC C ACT T T C AAAG AT GGAAAGAT G C CAG AAT T G G T T G AAC A 
AAAAG AT GT T T C T T T GGAT AT a AAC GAT AT GG AC AT G AGT AAGT T TAAAA 

CTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGGACAACTTATTGCA 
AAAAC T GG AAC AGT T G AAT T AG AT AT GT T T T T C AAAC AAT C T C AAG AC C C 
AGCT T CAATT ATT AAAAAAAT AT AC CTT AT CCAAAAT GGTGTT CCAAAT G 

AATTGAAAAAATTTGACTCTAGTTTTGGTTTAACTGAAAGTCAGATAGAT 
GGAT AC TAT AT T TAT AAAG AT G C AAT T AAC C T T AAAT T T AAAT T AAC CAG 
T G GT G C AAG T C T T AAAG T T GT T T AT AAAGGG C AAG AAG AT C CAT AT AGT C 
AT CAG AAAG AAG AT AT GACT AAAAAAG GT G AAC AG CT C AGT C AT T C AAC T 
C AAG C C AAT G AAAAT AC AG C AAAAGT AAC C T T T G C T AAT AT T GACT G GT C 
ACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGTTAAAGGTAGTG 
AGT T AC CTT T AAC T AAAG GAT G GAC AAC AT T T G TAT T AC AT AAAAC AG AA 
AAT T C AT T AAAT GT T AAAAGT T T GAT TAT GG AG AC GGGT AG T GT AAG T AA 
G AAAGT T C AAC AAC T T C CT T T AAGT C CT AGAT TAT C T AAAAAT AAG CAT A 
T G AGG GAT AT G CT AC T TAG TAT G C AAAAAG AT T CAG C GT AT T ACG AAAC A 
AGT GAC AGT C T AG T C C T T C G AAT T AAT CT C AC T G CAG AT AC T AAAC T T AA 
T T T T AAT GCT GT T AAAG GAG C G AGT G C T CT T AC T G AAAAT AT GAT GAT G A 
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GAC AG T T T GC AG T T G C T G GAC CAC AAG AT GAT C CT G T T AGT G AAC AT AAA 

TACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAACTGCTAGTGA 
GG C AAC T C T AAAT GGT AAGG AAAT C AC AGC AT C T G GT AT T AT CG G T C AC A 
T C AAG GAT GGT GAT AAAAG C AAG C AT GT.T G AAGT C AAAAT GGT G AAT GAA 
AAT GGAGAC AT G CT AGG AAC C C CTG T TAT TAT T C AAGGT AAAGAC T T GAC 
T AAT C G AAC AAAAC CAT T AAT GAGT GG AC GT AG AG T AC T T T AT GC CG GT A 
AAC AAT AT GAG T T C C GG GC T AAAT T AC C AC T T AGT CGT T T T AAC AC T T G G 
ATTAGGGTTGAAGTGGTAACAGAAGCAGGAGAGAAAGCAAGTATTGTTCG 
T CG C AT GT T C T T T GAC C AAT C AGT T C C AGAG C T T AAC AC AG C AGT T G C T A 

AACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTTGCCAAAGAT 
GACT CT C T AAAAC T AAAAT TAT AT C AAG AT GAT TC ATT AC tTGAATCTGT 
T GAT AAAAC C G G T C T TT AT AGT T T TAG AAAT GGT GT AG AAAT C AC T AAAG 
AT AT GAC AGT AC C ACT AG AAT T T G G AGAT AAT AT TAT T AAGT TAT C T G C T 
GT T GAC T TAT C AAAT TAT C GT C G T AAT GAG AC C C T T C AT AT CT AT AGAAA 
C C GT T T T GAT GT T AAAG C AAG C C AAAT GAC AG C T GAC AAAGGAGCT AAAG 
T AAC T GT G G a TAT GT T GAT G AA(S C ACT T AGT T GT T C C AG AAAT G G C AG G A 
G CT T AT AC AT T AAC AAT C G ACG AAGC T C C AAAC AC AAAT GAAT C AGG AAT 
GT T AAC AAAC GC T AAAGT AT C GAT T CAT T AT GT AAAT GGTGGTGTT GAT A 
AAGT T G AT GT T C C GAT T AAAG TAG T T GAC T TAG AAG C TAT T C GT AAAGC T 
GAAG AAG C AC GT AAAG CT G AAG AAG C ACGT AAAG C T G AAG AGGG AC AT AA 
AACCCAAGAAGCACCTATAGTTGAAGAAGGCTACAAGGTTAATAACGTTC 
AT C AAACT G AT AC T AC AGT T AAAG C GT C T GAT T T AC C AAAG ACT AAG AC A 
GT T T C C G CAG T T CAT AT GG C TAG AAC AG AC AAT AAAC AGAT AAC T T CAC A 
T C AG AC AC AT GT T GAA 

SEQ ID NO. 5105 
STRAIN M732 

TTGAATAATAAAGGTGTCGGTGGCGATGGTGTCC 

AAAT T TAT C AAT AC TAT AT C AAAAT GG AC AAC AAT AAAC CT TACT T AAGT 
C C C AAAGAT AAG AC TAG T G T AG AGAAG T TAG AAG AT CG CT G GAAAAAAAT 
TACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTTTATCTTC 
AATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCTTATCACA 
C C T C C AGG AT T T AAAAAAGAAGAT AAAAAAGT T G AAAAAC C AAAAT TAG A 
CCGTCCac C AGG AAT T GAT T T AC CAG CAC C AAC T T C AAT G AGAAGT T T T G 
ATTATTCAACCCCACCGGGAACTAAGCCAAGCAAACCCAAAGATAGTTTA 
TCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGATGAAGCCAC 
CAAAGG AT AGT AAAAAAGACG CT AT T GAAG AT AAAT CAG GAG C AAT T AAA 

TATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTTTAGCTAG 
C AAAGT AAAT GG C AAAAT AT T AC AAGT CG AAT C T GAT G G C AAAT T AGT C A 

TTCCTAGAAATGCTTTGTCAGCTAATCAATTTGATGACACTAGTCTTAAA 
ATT TAT CGT AAT AAT AAT CGC AAT AAAG AAAT TACT AT CAC AAC AG ATT A 
T T T T GC AG AT AC AAAAT AT GT C AAT AT CAC AG C GG T T GACT AT T T GAG C A 
AT ACT AC T T T T GAG C AAT TAG C T ACT G GT G AAAC AGT AGAT T AC CAT GC C 
ATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAAGATTTA 
TGT TAACGATAAATT GCAAGAAACT T CT C GTATAGCGCT T AAAGAT AAAT 
CTG T T AAG AT T G GT AT T GAAT T AC C AAAT G AT GT CAG AC AT AT T GAT AGT 

TTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCTTGAA 
AAAT GAT GAAC AAG AC AT T AAT C T CAG C AAAACT T AC C AAT T AAAAT AC A 
AC C C GAC AAAT C GT C GT C TAG AG T T TAG TAT T AAT AAC AT T AAC T C AAGT 
T CAG AAAT CAT GAC CAC T T T C AAAGAT GG AAAGAT GC C AGAAT T G G T T G A 
AC AAAAAGAT GT T T C T T T GG AT AT AAAC GAT AT G GAC AT GAGT AAG T T T A 
AAACTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGGACAACTTATT 
GCAAAAACTGGAACAGTT GAAT T AGAT ATGTTTTTC AAAC AAT CTCAAGA 
C C CAG C T T C AAT TAT T AAAAAAAT AT AC C T TAT C C AAAAT GG t GT T C C AA 
AT GAAT T G AAAAAAT T T GAC T CT AG T T TT GGT T T AAC T GAAAG T CAG AT A 
GAT G GAT ACT AT AT T TAT AAAG ATG C AAT T AAC CT T AA aT T T AAAT T AAC 

CAGTGGTGCAAGTCTTAAAGTTGTTTATAAAGGGCAAGAAGATCCATATA 
GT CAT CAG AAAG AAG AT AT GAC T AAAAAAG GT G AAC AG CT CAG T CAT T C A 
ACT C AAG C C AAT G AAAAT AC AG C AAAAG T AAC CT T T G C T AAT AT T G ACT G 
GTCACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGGTAAAGGTA 
GTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATTACATAAAACA 
G AAAAT T CAT T AAAT G T T AAAAGT T T GAT TAT G GAG ACG G GT AGT G T AAG 
T AAG AAAGT T C AAC AACT T c CT T T AAG T C C TAG AT TAT CT AAAAAT AAG C 
AT AT GAG GG AT AT G C T ACT TACT AT G C AAAAAGAT T CAG C GT AT T AC GAA 
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ACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAGATACTAAACT 
T AAT T T T AAT G C T GT T AAAG GAG C G AGT G CT CT T AC T G AAAAT AT GAT G A 
T G AGAC AGT T T GC AGT T G C T G GAG C AC AAGAT GAT C CT G T T a GT G AAC AT 
AAATACCCATCAGTaTTTCTCTTAACTCCTGCCTTATTGGAAaCTGCTAG 
T GAGG CAACT CT AAAT GGT AAG G AAAT CAC AG CAT C T GG T AT TAT C G GT C 
AC AT C AAGGAT G GT G AT AAAAG C AAGC AT GT T G AAG T C AAAAT GGT G AAT 
G AAAAT GGAG AC AT G C T AGG AAC C C C T GT T AT TAT T CAAGGT AAAGACT T 
GACT AAT CGAAC AAAAC CAT T AAT GAG T GGAC G TAG AGT AC T T TAT G CC G 
GTAAACAATATGAGTTCCGGGCTAAATTACCACTTAGtCGTTTTAACACT 
TGGATTAGGGTTGAAGTGGTAACAGAAGCAGGAGAGAAAGCAAGTATTGT 
T C G T C G C AT GT T C T T T GAC C AAT C AGT T C C AG AGCT T AAC AC AG CAGT T G 
CTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTTGCCAAA 
GAT GAC T C T C T AAAAC T AAAAT TAT AT C AAG AT GAT T CAT T AC T T G AAT C 

TGTTGATAAAACCGGTCTTTATAGTTTTAGAAATGGTGTAGAAATCACTA 
AAGAT AT GAC AGT AC CAC T AGAATT T G GAG AT AAT AT TAT T AAGT T AT CT 
G C T GT T GAC T T AT C AAAT TAT C GT C GT AAT G AGAC C C T T CAT AT C TAT AG 
AAAC CGT T T T GAT GT T AAAG C AAGC C AAAT GAC AG C T GAC AAAGGAG C T A 
AAGTAACTGTGGATATGTTGATGAAGCACTTAGTTGTTCCAGAAATGGCA 
GGAG C T TAT AC AT T AAC AAT C G ACGAAG C T C C AAAC AC AAAT G AAT C AG G 

AATGTTAACAAACGCTAAAGTATCGATTCATTATGTAAATGGTGGTGTTG 
ATAAAGTTGATGTTCCGATTAAAGTAGTTGACTTAGAAGCTATTCGTAAA 
GCTGAAGAAGCACATAAAGCTGACGAAGCACGTAAAGCTGAAGAAGCACG 
TAAAGCTGAAGAAGCACATAAAGCTGAAGAAGTACGTAAAGCTGAAGAAG 
C AC AT AAAGT C GAAG AAGC AC GT AAAG C T G AAGAG GGAC AT AAAAC C C AA 
GAAGCACCTATAGTTGAAGAAGGCTACAAAGTTAATAACGTTCATCAAAC 
T GAT ACT AC AGT T AAAG CG T C T GAT T T AC C AAAG AC T AAG AC AGT T T C C G 

CAGTT CATAT GGCTAGAAC AGACAATAAAC AGAT AACTT C ACAT C AG AC A 
CAT GT T GAAAA 

SEQ ID NO. 5106 
STRAIN COH1 

TTGAATAATAAAGGTGTCGGTGGCGATGGT 

GTCCAAATTTATCAATACTATATCAAAATGGACAACAATAAACCTTACTT 
AAG T C C C AAAG AT AAG ACT AC T GT AG AG AAGT TAG AAGAT CG C T GG AAAA 
AAAT T ACTT T C AAAG T T C AG GAT ACT GG CAT T G GT T T G AAAGAC GT T TAT 
C T T C AAT CT GT T AAGT AT GT T G GTG G TGGC AAT AAT AAT T TAG AC C T TAT 
CAC AC C T C C AG GAT T T AAAAAAG AAG AT AAAAAAG TT GAAAAAC C AAAAT 
TAG AC C GT C C ACC AG G AAT T GAT T T AC C AG CAC CAACT T C AAT GAG AAG T 
T T T GAT TAT T C AAC C C CAC C GG GAAC T AAG C C AAG C AAAC C C AAAG AT AG 
TTTATCAACTCCTCCAGGtTTCCCAGATTTAAACACGCCGCCGGATGAAG 
C C a C CAAAGGAT AGT AAAAAAG AC G C T AT T GAAGAT AAAT C AG GAG C AAT 

T AAAT ATGCT AAGT CTCTTC AACTT AG CTTTGTTGATGACCCTATTTT AG 
C T AG C AAAG T AAAT GG C AAAAT AT T AC AAGT C GAAT CT G AT GG C AAAT T A 
GT CAT T C C TAG AAAT GC T T T GT C AG C T AAT C AAT T T GAT GAC AC T AGT C T 
T AAAAT T TAT C GT AAT AAT AAT C GC AAT AAAG AAAT T AC TAT CAC AAC AG 
AT TAT T T T G C AG AT AC AAAAT AT G T C AAT AT C AC AGCGGT T GACT AT T T G 
AG C AAT ACT AC T TT T G AG C AAT TAG C TACT GGT G AAAC AGT AG AT T AC C A 
TGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAAGA 
TTTATGTTAACGATAAATTGCAAGAAACTTCTCGTATAGCGCTTAAAGAT 
AAAT C T GT T AAGAT T GGT AT T G AAT T AC C AAAT GAT G T C AGAC AT AT T G A 

TAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCT 
T GAAAAAT GAT G AAC AAGAC AT T AAT C T C AG C AAAAC T TAG C AAT T AAAA 
T AC AAC C CG AC AAAT C G T CGT C TAG AG T T T AC TAT T AAT AAC AT T AAC T C 
AAGT T C AGAAAT CAT GAC C AC TT T C AAAG AT GG AAAG AT G C C AG AAT T GG 
T T G AAC AAAAAG AT GTTTCTTTG G AT AT AAACG AT AT G GAC AT G AG T AAG 
T T T AAAAC TAT T CG AC T T G GAC G AAAGG AT T CT G AAT T T AAG GGAC AAC T 
TAT T GC AAAAACT G GAAC AGT T GAAT TAG AT AT G T T T T T C AAAC AAT C T C 
AAGACCCAGCTTCAATTATTAAAAAAATATACCTTATCCAAAATGGTGTT 
C C AAAT GAATTGAAAAAATTTGACTCTAGTTTTGGTTTAACTG AAAGT C A 
GAT AG AT G GAT AC TAT AT T T AT AAAG AT G C AAT T AAC CT T AAAT T T AAAT 

TAACCAGTGGTGCAAGTCTTAAAGTTGTTTATAAAGGGCAAGAAGATCCA 
TAT AGT CAT C AG AAAG AAG AT AT GACT AAAAAAG GT GAAC AG C T CAGT C A 
T T C AAC T C AAG C C AAT G AAAAT AC AG C AAAAGT AAC C T T T G C T AAT AT T G 
AC T G GT CAC AT TAT AGT AAGG T TACT GT G AAT GG AAAAG AAGT T G GT AAA 
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GGTAGTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATTACATAA 
AAC AG AAAAT T CAT T AAAT GT T AAAAG T T T G AT T AT GG AGAC G GGT AGT G 
T AAGT AAG AAAG T T C AAC AAC T T C C T T T AAGT C CT Ag AT TAT CT AAAAAT 
AAGC AT AT G AGGG AT AT GC T AC T TACT AT G C AAAAAGAT T C AGC G TAT T A 
CGAAACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAGATACTA 
AAC T T AAT T T T AAT G C T GT T AAAGG AG CG AGT GC T CT T AC T GAAAAT AT G 
AT GAT GAGAC AGT T T G C AGT T G C T G G AC C AC AAGATG AT C C T GT T AGT GA 
AC AT AAAT AC C CAT C AG TAT T T C T CT T AACT C C T G C C T TAT T GG AAAC T G 
CT AGT GAG G C AAC T C T AAAT GGT AAGGAAAT C AC AGC AT C T G GT AT T AT C 
GGT C AC AT CAAG GAT GGT GAT AAAAG C AAG CAT G T T GAAGT C AAAAT GGT 
GAATGAAAATGGAGACATGCTAGGAACCCCTGTTATTATTCAAGGTAAAG 
ACTTGACTAATCGAACAAAACCATTAATGAGTGGACGTAGAGTACTTTAT 
G C C G GT AAAC AAT AT G AGT T C C GGG CT AAAT T AC C AC T T AGT C GT T T T AA 
C ACT T G GATT AGGGT T GAAG T GG T AAC AGAAG C AGG AG AG AAAG CAAG T A 
TTGTTCGTCGCATGTTCTTTGACCAATCAGTTCCAGAGCTTAACACAGCA 
GTTGCTAAACGTGATTtGACTTCTGATACTGCTCTTATCCACATCGTTGC 
C AAAG AT GAC T C T C T AAAa C T AAAAT TAT AT CAAG AT GAT T CAT T AC T T G 
AAT CT GT T G AT AAAAC C G GT CT T TAT AGT T T T AGAAAT GGT G TAG AAAT C 
ACT AAAG AT AT G AC AGT AC C ACT AG AAT T T GGAGATAATAT TAT T AAGT T 

ATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTCATATCT 
AT AG AAAC C GT T T T GAT GT T AAAG C AAGCC AAAT GAC AG CT G AC AAAGGA 
G C T AAAGT AAC T GT GG AT AT GT T GAT GAAG C ACT T AGT T GT T C C AGAAAT 
GGCAGGAGCT TAT ACATT AACAAT CGACGAAGCT C CAAACACAAAT GAAT 
CAGGAATGTTAACAAACGCTAAAGTATCGATT CAT TATGT AAAT GGT GGT 
G T T GAT AAAGT T G AT GT T C C GAT T AAAGT AGT T G ACT T AGAAG C T AT T C G 
T AAAGC T GAAG AAG C AC AT AAAG C T GAC GAAG C AC GT AAAG C T G AAGAAG 
C AC GT AAAG C T GAAGAAG C AC AT AAAG CT GAAGAAGT AC GT AAAGC T G AA 
GAAG C AC AT AAAGT C GAAG AAG C AC GT AAAGC T GAAG AG G GAC AT AAAAC 
C CAAG AAG C AC C TAT AGT T GAAGAAG G CT AC AAAGT T AAT AACGT T CAT C 
AAACT GAT ACT AC AGT T AAAGC GT C T GAT T T AC C AAAG AC T AAG AC AG T T 

TCCGCAGTTCATATGGCTAGAACAGACAATAAACAGATAACTTCACATCA 
GACACATGT 

SEQ ID NO. 5107 
STRAIN M781 

TTGAATAATAAAGGTGTCGGTGGCGATGGT 

GT C CAAAT T TAT C AAT AC TAT AT C AAAAT GG AC AAC AAT AAACC T TACT T 
AAGT C C C AAAG AT AAGAC T AC T GT AGAG AAG T T AGAAG AT C G C T GG AAAA 

AAATTACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTTTAT 
CT T C AAT C T G T T AAG T AT GT T GGT G GT G GC AAT AAT AAT T TAG AC C T TAT 
C AC AC C T C C AGG AT T T AAAAAAGAAG AT AAAAAAGT T G AAAAAC C AAAAT 
TAG AC C GT C C AC C AGG AAT T GAT T T AC CAG C AC C AAC TT C AAT GAG AAG T 
T T T GAT TAT T C AAC C C C AC CG G GAAC T AAGC CAAG C AAAC C C AAAG AT AG 

TTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGATGAAG 
C C a C C AAAGG AT AGT AAAAAAG AC GCT AT T GAAG AT AAAT C AGGAG C AAT 

TAAATATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTTTAG 
CT AG C AAAGT AAAT G G C AAAAT AT T AC AAGT C GAAT C T GAT G G CAAAT T A 
GT CAT T C C TAG AAAT G C T T T GT C AGC T AAT C AAT T T GAT GAC AC T AGT C T 
T AAa AT T TAT C G T AAT AAT AAT CG C AAT AAAG AAAT T a C T AT C AC AAC AG 
AT TAT T T T G CAG AT AC AAAAT AT GT C AAT AT C AC AG C GGT T GAC TAT T T G 
AG C AAT AC TACT T T T GAG C AAT T AGC TACT G GT G AAAC AG TAG AT T AC C A 

TGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAAGA 
T T T AT G T T AAC GAT AAAT T G CAAG AAAC T T C T CG T AT AG C G C T T AAAGAT 

AAATCTGTTAAGATTGGTATTGAATTACCAAATGATGTCAGACATATTGA 
TAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCT 
T G AAAAAT GAT GAAC AAG AC AT T AAT C T CAG C AAAACT T AC C AAT T AAAA 
T AC AAC C C GAC AAAT C GT C GT C TAG AG T T TACT AT T AAT AAC AT T AAC T C 
AAG T T C AGAAAT CAT GAC C AC T T T C AAAG AT GG AAAGAT G C CAG AAT T GG 
T T GAAC AAAAAGAT GTTTCTTTG GAT AT AAAC GAT AT GG AC AT G AGT AAG 
T T T AAAACT AT T C GAC T T GG AC G AAAG GAT T C T GAAT T T AAG G GAC AAC T 
TAT T G C AAAAAC T G GAAC AGT T G AAT T AGAT AT G T T T T T C AAAC AAT C T C 
AAGAC C CAGCTTC AAT T ATT AAAAAAAT AT ACCTT AT CC AAAAT GGT GTT 
C CAAAT GAATTGAAAAAATTTG ACT CT AGT TTT GGT TTAACTG AAAGT C A 
GAT AG AT GG AT AC TAT AT T TAT AAAG AT GC AAT T AAC C T T AAAT T T AAAT 
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T AAC C AGT GGT G C AAG T C T T AAAGT T GT T T AT AAAGGGC AAG AAG AT CCA 
T AT AGT CAT C AG AAAG AAG AT AT G AC T AAAAAAG GT G AAC AG C T C AGT C A 
T T C AAC T C AAG C C AAT GAAAAT AC AG C AAAAGT AAC CT T T G C T AAT AT T G 

ACTGGTCACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGGTAAA 
GGT AGT GAG T T AC C T T T AAC T AAAG G AT GGAC AAC AT T T GT AT T AC AT AA 
AAC AGAAAAT T CAT T AAAT GT T AAAAG T T T GAT T AT GGAGACG G GT AG T G 
T AAG T AAG AAAGT T C AAC AACT T C C T T T AAGT C CT AG AT TAT CT AAAAAT 
AAG CAT AT G AGGG AT AT G C TACT TAG TAT G C AAAAAGAT T C AGC GT AT T A 
CGAAACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAGATACTA 
AACTTAATTTTAATGCTGTTAAAGGAGCGAGTGCTCTTACTGAAAATATG 
AT GAT GAGAC AGT T T GC AGT T G C T GGAC C AC AAG AT GAT C CT G T T AGT G A 

ACATAAATACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAACTG 
C T AGT GAG G C AAC T CT AAAT GG T AAGGAAAT C AC AG CAT C T G GT AT T AT C 
GGT CAC AT C AAG GAT G GT GAT AAAAG C AAG C ATGT T G AAG T C AAAAT GGT 
G AAT GAAAAT GG AG AC AT G C T AG G AAC C C CT G TT AT TAT T C AAGGT AAAG 
AC T T GAC T AAT CG AAC AAAAC CAT T AAT GAG T GG AC G T AGAG T AC T T T AT 
G C C GGT AAAC AAT AT G AGT T C CGGG C T AAAT T AC CAC T T AGT C G T T T T AA 
C ACT T GG AT T AGGGT T GAAG T G GT AAC AGAAG C AGG AGAGAAAG C AAG T A 
TTGTTCGT CG C AT G T T C T T T G AC C AAT C AG T T CC AG AG CT T AAC AC AG C A 

GTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTTGC 
C AAAG AT GAC T CT C T AAAAC T AAAAT TAT AT C AAGAT GAT T C AT T AC T T G 

AATCTGTTGATAAAACCGGTCTTTATAGTTTTAGAAATGGTGTAGAAATC 
AC T AAAGAT AT GAC AGT AC CAC TAG AAT T T G GAG AT AAT AT TAT T AAGT T 

ATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTCATATCT 
AT AG AAAC C G T T T T GAT G T T AAAG C AAG C C AAAT GAC AG C T GAC AAAGGA 
G C T AAAGT AACT GT G GAT AT GT T GAT GAAG CAC T T AGT T GT T C C AG AAAT 
GG C AGG AG C T TAT AC AT T AAC AAT C GAC GAAG C T C C AAAC AC AAAT GAAT 

CAGGAATGTTAACAAACGCTAAAGTATCGATTCATTATGTAAATGGTGGT 
GT T GAT AAAG T T GAT GT T C C G AT T AAAGT AGT T G AC T TAG AAG C TAT T CG 
T AAAG C T GAAG AAG CAC AT AAAG C T GAC GAAG C AC GT AAAG C T G AAGAAG 
C AC GT AAAG C T GAAG AAG CAC AT AAAG C T G AAGAAGT AC GT AAAG C T G AA 
GAAG CAC AT AAAGT C GAAG AAG CAC CG T AAAG C T GAAG AG GG AC AT AAAA 
C C C AAG AAG CAC C TAT AGT T GAAGAAG G C T AC AAAGT T AAT AAC G T T CAT 
C AAACT GAT AC T AC AGT T AAAG C G T CT G AT TT AC C AAAGAC T AAG AC AG T 
T T CCGC AGTT CAT AT GGCT AG AAC AGAC AAT AAAC AG AT AACT T CAC AT C 
AG AC AC AT GT T G 

SEQ ID NO. 5109 
STRAIN JM9130013 

T G GT GT C C AAAT T TAT C AAT ACT AT AT C AAAAT GGAC AAC AAT AAAC 
CT T AC T T AAG T C C C AAAG AT AAG AC TACT GT AG AGAAGT T AGAAG AT C G C 
TGGAAAAAAATTACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGA 
CGTTTATCTTCAATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAG 
AC C T T AT CAC AC C T C C AGGAT T T AAAAAAGAAG AT AAAAAAGT T G AAAAA 
C C AAAAT T AGAC C GT C CAC C AG GAAT T GAT T T AC C AG C AC C AAC T T C AAT 
GAG AAGT T T T GAT TAT T C AAC C C C AC CGG G AAC T AAG C C AAG C AAAC CCA 
AAGAT AGT T TAT C AAC T C C T C C AGGT T T CC C AG AT T T AAAC AC GC CG C C G 
GAT GAAG CAC C AAAG GAT AGT AAAAAAG AC G C T AT T GAAG AT AAAT C AGG 

AGCAATTAAATATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTA 
T T T TAG C T AGC AAAGT AAAT G G C AAAAT AT T AC AAG T C GAAT C T GAT GG C 
AAAT T AGT CAT T C C TAG AAAT G CT T T GT C AG C T AAT C AAT T T GAT GAC AC 
TAG T C T T AAAAT T TAT C GT AAT AAT AAT CG C AAT AAAG AAAT TACT AT C A 
C AAC AG AT TAT T T T G C AG AT AC AAAAT AT GT C AAT AT CAC AG C G GT T GAC 
TAT T T G AGC Aa T AC T AC T T T T GAG C AAT TAG C T AC T GGT G AAAC AGT AG A 
TTACCATGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTG 
GT AAG AT T T AT GT T AAC GAT AAAT T G C AAG AAAC T T C T C GT AT AG C GC T T 
AAAG AT AAAT C T G T T AAG AT T GGT AT T G AAT T AC C AAAT GAT G T C AG AC A 

TATTGATAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATA 
AT AT C T T G AAAAAT GAT G AAC AAG AC AT T AAT CT C AG C AAAAC T T AC C AA 

TTAAAATACAACCCGACAAATCGTCGTCTAGAGTTTACTATTAATAACAT 
T AAC T C AAG T T C AG AAAT CAT GAC C ACT T T C AAAG AT G G AAAGAT G C C AG 
AAT T G G T T G AAC AAAAAGAT GT T T C T T T G G AT AT AAACG AT AT G GAC AT G 
AGTAAGTTTAAAACTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGG 
AC AACT TAT T G C AAAAACT GG AAC AG T T GAAT TAG AT AT GT T T T T C AAAC 
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AAT CT C AAGAC C C AG CT T C AAT TAT T AAAAAAAT AT ACCT TAT C C AAAAT 

GGTGTTCCAAATGAATTGAAAAAATTTGACTCTAGTTTTGGTTTAACTGA 
AAGTCAGATAGAT GGAT ACT AT AT T T ATAAAGAT GCAAT T AACCTTAAAT 

TTAAATTAACCAGTGGTGCAaGTCTTAAAGTTGTTTATAAAGGGCAAGAA 
GAT CCAT ATAGT CAT CAGAAAGAAGATATGACT AAAAr AGGT GAACAGCT 
C AGT CAT T C AAC T C AAG C C AAT G AAAAT AC AGC AAAAGT AAC C T T T G CT A 
AT AT T GAC T GGT C AC AT TAT AG T AAGGT T ACT GT GAAT G GAAAAGAAGT T 
GGTAAAGGTAGTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATT 
ACATAAAAC AG AAAAT T CAT T AAAT G T T AAAAGT T T GAT TAT GGAG ACGG 
GT AGT GT AAGT AAGAAAG T T C AAC AACT T C C T T T AAGT C C TAG AT TAT CT 
AAAAAT AAG CAT AT G AGGG AT ATG CT AC T T ACT AT G C AAAAAG AT T C AGC 
GTATTACGAAACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAG 
ATACTAAACTTAATTTTAATGCTGTTAAAGGAGCGAGTGCTCTTACTGAA 
AAT AT GAT GAT GAG AC AGT T T G C AGT T G C T GG AC C AC AAGAT GAT CC T GT 
TAG T G AAC AT AAAT AC C CAT C AGT ATT T CT C T T AAC T C CT GC C T TAT T GG 
AAACTGCTAGTGAGGCAACTCTAAATGGTAAGGAAATCACAGCATCTGGT 
AT TAT C G GT C AC AT C AAG GAT GGT GAT AAAAG C AAG CAT GT T G AAGT C AA 

AATGGTGAATGAAAATGGAGACATGCTAGGAACCCCTGTTATTATTCAAG 
GT AAAGACT T GAC T AAT C GAAC AAAAC CAT T AAT G AGT GGAC GT AGAG T A 
C T T TAT G C C GG TAAAC AAT AT G AGT T C CGG GC T AAAT T AC C AC T T AGT CG 
T T T T AAC ACT T G GAT TAG G GT T G AAG T GGT AAC AGAAG C AG GAg a G a a a g 
cAaGTATTGTTCGTCGCATGTTCTTTGACCAATCAGTTCCAGAGCTTAAC 
ACAGCAGTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACAT 
C GT T G C C AAAG AT G AC T C T CT AAAAC T AAAAT TAT AT C AAGAT GAT T CAT 

TACTTGAATCTGTTGATAAAACCGGTCTTTATAGTTTTAGAAATGGTGTA 
GAAAT CACT AAAGAT ATGACAGT ACCACT AGAATT TGGAGATAATAT TAT 
TAAGTTATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTC 
AT AT C T AT AG AAAC C GT T T T G AT GT T AAAG C AAG C C AAAT GAC AGC T GAC 
AAAGGAG C T AAAGT AACT GT G G AT AT GT T GAT G AAG C ACT T AGT T G T T C C 
AGAAAT GG C AG G AG CT TAT AC AT T AAC AAT C GAC G AAGC T C C AAAC AC AA 
AT GAAT CAGGAAT GTTAACAAACGCT AAAGT ATCGAT T CATTAT GTAAAT 
GGTGGTGTTGATAAAGTTGATGTTCCGATTAAAGTAGTTGACTTAGAAGC 
TAT T C GT AAAG C T G AAG AAG C AC AT AAAG CT G AC GAAGC AC GT AAAG CT G 
AAGAAGC AC GT AAAG C T G AAG AAG C AC AT AAAGC T G AAG AAG T AC GT AAA 
GCTG AAG AAGC AC AT AAAGT CG AAG AAGC AC CGT AAAG CTGAAGAGGGAC 
AT AAAAC C C AAG AAG C AC C TAT AG T T G AAG AAGG C T AC AAG GT T AAT AAC 
GTTCATCAAACTGATACTACAGTTAAAGCGTCTGATTTACCAAAGACTAA 
GAC AGT T T C C G C AG T T CAT AT G G CT AG AAC AGAC AAT AAAC AG AT AAC T T 
CACATCAGAC ACATGT TG 

SEQ ID NO. 5110 
STRAIN 2 603 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEAPKDSKKDAIEDKSGAIKYAKSLQLSFVDGPILASKV 
NGKILQVESDGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAV 
DYLSNTTFEQLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGI 
ELPNDVRHIDSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLE FTINNINS 
SSEIMTTFKDGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELD 
MFFKQSQDPASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSG 
AS LKVVYKGQE DP YSHQKE DMTKKGEQL SHSTQANENTAKVT FAN I DWSHYS KVT VNGKE 

WKGSELPLTKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDML 
LTMQKDSAYYETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSE 
HKYPSVFLLTPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTP 
VIIQGKDLTNRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEVVTEAGEKASIVRR 
MFFDQSVPELNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNG 
VEITKDMTVPLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDM 
LMKHL W PE MAG A Y TLTIDEAPNTNES GM L TN AK V S I H Y VN G G V D KV D V P I K V V D LE A I R 

KAEEARKAEEARKAEEARKAEEGHKTQEAPIVEEGYKVNNVHQTDTTVKASDLPKTKTVS 

AVHMARTDNKQITSHQTHVEKQIKNTLPSTGDSKRGYYITGMAIVMLSVLFSLAKKFKSK 
Y 

SEQ ID NO. 5111 

STRAIN A909 frame: 1 
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LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPPPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEALKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASKV 
NGKILQVESDGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAV 
DYLSNTTFEQLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGI 
ELPNDVRHIDSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINS 
SSEIMTTFKDGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELD 
MFFKQSQDPASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSG 
ASLKWYKGQEDPYSHQKEDMTKKGEQLSHSTQANENTAKVTFANIDWSHYSKVTVNGKE 
VGKGSELPLTKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDML 
LTMQKDSAYYETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSE 
HKYP S VFLLT PALLET AS E ATLNGKE I TAS G 1 1 GH I KDG DKS KHVE VKMVNENGDMLGT P 
VIIQGKDLTNRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEWTEAGEKASIVRR 
MFFDQSVPELNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNG 
VEITKDMTVPLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDM 
LMKHLWPEMAGAYTLTIDEDPNTNESGMLTNAKVSIHYVNGGVDKVDVPIKVVDLEAIR 
KAEEAHKADEARKAEEARKAEEARKAEEARKAEEGHKTQEAPIVEEGYKVNNVHQTDTTV 
KASDLPKTKTVSAVHMARTDNKQITSHQTHVEKQIKN 

SEQ ID NO. 5112 

STRAIN H3 6B frame: 2 

GVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVYLQSVKYVGG 
GNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTKPSKPKDSLS 
TPPGFPDLNTPPDEALKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASKVNGKILQVES 
DGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAVDYLSNTTFE 
QLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGIELPNDVRHI 
DSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINSSSEIMTTFK 
DGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELDMFFKQSQDP 
ASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSGASLKWYKG 
QEDPYSHQKEDMTKKGEQLSHSTQANENTAKVTFANIDWSHYSKVTVNGKEVGKGSELPL 
TKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDMLLTMQKDSAY 
YETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSEHKYPSVFLL 
TPALLETASEATLNGKEITASGI I GHIKDGDKSKHVE VKMVNENGDMLGT PVIIQGKDLT 
NRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEWTEAGEKASIVRRMFFDQSVPE 
LNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNGVEITKDMTV 
PLEFGDNITKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDMLMKHLVVPE 
MAGAYTLTIDEAPNTNESGMLTNAKVSIHYVNGGVDKVDVPIKWDLEAIRKAEEAHKAD 
EARKAEEARKADEAHBCAEEVRKAEEAHKVEEARKAEEGHKTQEAPIVEEGYBCVNNVHQTD 
TTVKASDLPKTKTVSAVHMARTDNKQITSHQTH 

SEQ ID NO. 5113 

STRAIN 18RS21 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEAPKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASKV 
NGK I LQ VE S DGKLV I PRN AL S ANQ F D DT S LK I YRNNNRNKE I T I TT D Y FADT KYVN I T AV 
DYLSNTT FEQLATGETVD YHAI VFS S FAAIKDKGGKI YVNDKLQET SRI ALKDKS VKI GI 
ELPNDVRHIDSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINS 
SSEIMTTFKDGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELD 
MFFKQSQDPASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSG 
ASLKWYKGQEDPYSHQKEDMTKKGEQLSHSTQANENTABCVTFANIDWSHYSKVTVNGKE 
WKGSELPLTKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDML 
LTMQKD S AYYET S D S LVLR INLT ADTKLN FNAVKGAS ALTENMMMRQFAVAG PQDD PVSE 
HKYPSVFLLTPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTP 
VIIQGKDLTNRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEVVTEAGEKASIVRR 
MFFDQSVPELNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNG 
VEITKDMTVPLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDM 
LMKHLVVPEMAGAYTLTIDEAPNTNESGMLTNAKVSIHYVNGGVDKVDVPIKVVDLEAIR 
KAEEARKAEEARKAEEGHKTQEAPIVEEGYKVNNVHQTDTTVKASDLPKTKTVSAVHMAR 
TDNKQITSHQTHVE 

SEQ ID NO. 5114 

STRAIN M7 32 frame: 1 

LNNKGVGGDGVQI YQYYIKMDNNKPYLS PKDKTTVEKLEDRWKKIT FKVQDTGIGLKDVY 
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LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEATKG. . KRRY . R. IRSN . IC . VSST . LC . .PYFS.QS 
KWQNITSRI . WQISHS . KCFVS . SI . . H . S . NLS . . . SQ . RNYYHNRLFCRYKICQYHSG 
. LFEQYYF . AI S YW . NSRLPCHCI FKLCCY . RQGW . DLC . R . IARNFS YSA . R . IC . DWY 
.ITK.CQTY. .FICSSFE.G.NC. . YLEK . . TRH . SQQNLPIKIQPDKS S SRVYY . . H. L 
KFRNHDHFQRWKDARIG . TKRC FFGYKRYGHE . V . NYSTWTKGF . I . GTTYCKNWNS . IR 
YVFQTISRPSFNY . KNI PYPKWCSK . IEKI . L . FWFN . KSDRWILYL . RCN .P.I. INQW 
CKS . SCL . RARRSI . SSERRYD . KR . TAQSFNSSQ . KYSKSNLC . Y . LVTL . . GYCEWKR 
SW . R . . VTFN . RMDNICIT . NRKFIKC . KFDYGDG . CK . ESSTTSFKS .U.K. AYEGYA 
TYYAKRFS VLRNK . QSSPSN . SHCRY . T . F . CC . RSECSY . KYDDETVCSCWTTR .SC.. 
T.IPISISLNSCLIGNC. . GNSKW . GNHSIWYYRSHQGW . . KQAC . SQNGE . KWRHARNP 
CYYSR . RLD . SNKTINEWT . STLCR . TI . VPG . ITT . SF . HLD . G . SGNRSRRESKYCSS 
HVL . PISSRA. HSSC . T . FDF . YCSYPHRCQR. LSKTKIISR . FIT . IC . . NRSL . F . KW 
CRNH . RYDSTTRIWR . YY. VICC . LIKLSS . . DPSYL . KPF . C . SKPNDS . QRS . SNCGY 
VDEALSCSRNGRSLYINNRRSSKHK. IRNVNKR . SIDSLCKWWC . . S . CSD . SS . LRSYS 
. S . RST . S . RST . S . RST . S . RST . S . RST . S . RST . SRRST . S . RGT . NPRSTYS . RRL 
QS . . RSSN . YYS . SV . FTKD . DSFRSSYG . NRQ . TDNFTSDTC . K 

SEQ ID NO. 5115 

STRAIN COH1 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEATKG. . KRRY. R . IRSN . IC . VSST . LC . .PYFS.QS 
KWQNITSRI. WQISHS. KCFVS. SI. .H.S.NLS. . . SQ . RNYYHNRLFCRYKICQYHSG 
. LFEQYYF . AI S YW . NSRLPCHCI FKLCCY . RQGW . DLC . R . IARNFS YSA . R . IC . DWY 
.ITK.CQTY. .FICSSFE.G.NC. .YLEK. . TRH . SQQNLPIKIQPDKSSSRVYY . .H.L 
KFRNHDHFQRWKDARIG . TKRC FFGYKRYGHE . V . NYSTWTKGF . I . GTTYCKNWNS . IR 
YVFQTISRPSFNY . KNI PYPKWCSK . IEKI . L . FWFN . KSDRWILYL . RCN .P.I. INQW 
CKS . SCL . RARRS I . SSERRYD . KR . TAQS FNS SQ . KYSKSNLC . Y . LVTL . . GYCEWKR 
SW . R . . VTFN . RMDNICIT . NRKFIKC . KFDYGDG . CK . ESSTTSFKS .U.K. AYEGYA 
TYYAKRFS VLRNK . QSSPSN . SHCRY . T . F . CC . RSECSY . KYDDETVCSCWTTR . SC . . 
T.IPISISLNSCLIGNC. . GNSKW . GNHSIWYYRSHQGW . . KQAC . SQNGE . KWRHARNP 
CYYSR . RLD . SNKTINEWT . STLCR . TI . VPG .ITT . SF. HLD . G . SGNRSRRESKYCSS 
HVL . PISSRA. HSSC . T . FDF . YCSYPHRCQR . LSKTKIISR . FIT . IC . . NRSL . F. KW 
CRNH . RYDSTTRIWR . YY . VICC . LIKLSS . . DPSYL . KPF . C . SKPNDS . QRS . SNCGY 
VDEALSCSRNGRSLYINNRRSSKHK. IRNVNKR. SIDSLCKWWC . . S . CSD . SS . LRSYS 
. S . RST . S . RST . S . RST . S . RST . S . RST . S . RST . SRRST . S . RGT . NPRSTYS . RRL 
QS . . RSSN . YYS . SV . FTKD . DSFRSSYG . NRQ . TDNFTSDTC 

SEQ ID NO. 5116 

STRAIN M781 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEATKG. . KRRY . R . IRSN . IC . VSST . LC . .PYFS.QS 
KWQNITSRI. WQISHS. KCFVS. SI. .H.S.NLS. . . SQ . RNYYHNRLFCRYKICQYHSG 
. LFEQYYF . AI SYW . NSRLPCHCI FKLCCY . RQGW . DLC . R . IARNFS YSA . R . IC . DWY 
.ITK.CQTY. .FICSSFE.G.NC. . YLEK. . TRH . SQQNLPIKIQPDKSSSRVYY . .H.L 
KFRNHDHFQRWKDARIG . TKRCFFGYKRYGHE . V . NYSTWTKGF . I . GTTYCKNWNS . IR 
YVFQT ISRPS FNY . KNI PYPKWCSK . IEKI . L . FWFN . KSDRWILYL . RCN .P.I. INQW 
CKS . SCL . RARRSI . SSERRYD . KR. TAQSFNSSQ . KYSKSNLC . Y . LVTL . . GYCEWKR 
SW . R . . VTFN . RMDNICIT . NRKFIKC . KFDYGDG . CK . ESSTTSFKS .U.K. AYEGYA 
TYYAKRFS VLRNK . QSS PSN . SHCRY . T . F . CC . RSECSY . KYDDETVCSCWTTR . SC . . 
T.IPISISLNSCLIGNC. . GNSKW . GNHSIWYYRSHQGW . . KQAC . SQNGE . KWRHARNP 
CYYSR . RLD . SNKTINEWT . STLCR . TI . VPG . ITT . SF. HLD . G . SGNRSRRESKYCSS 
HVL . PI SSRA . HSSC . T . FDF . YCSYPHRCQR . LSKTKIISR . FIT . IC . . NRSL . F . KW 
CRNH . RYDSTTRIWR . YY . VICC . LIKLSS . . DPSYL . KPF . C . SKPNDS . QRS . SNCGY 
VDEALSCSRNGRSLYINNRRSSKHK . IRNVNKR . SIDSLCKWWC . . S . CSD . SS . LRSYS 
. S . RST . S . RST . S . RST . S . RST . S . RST . S . RST . S RRS T VKLKRD I KPKKHL . LKKA 
TKLITFIKLILQLKRLIYQRLRQFPQFIWLEQTINR . LHIRHML 

SEQ ID NO. 5117 

STRAIN JM9130013 frame: 2 

G VQ I YQYY I KMDNNKPYLS PKDKTT VEKLE DRWKK I T FKVQDTG I GLKDVYLQS VKYVGG 
GNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTKPSKPKDSLS 
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TPPGFPDLNTPPDEAPKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASPCVNGKILQVES 

DGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAVDYLSNTTFE 

QLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGIELPNDVRHI 

DSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINSSSEIMTTFK 

DGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELDMFFKQSQDP 

ASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSGASLKWYKG 

QEDPYSHQKEDMTKXGEQLSHSTQANENTAKVTFANIDWSHYSKVTVNGKEVGKGSELPL 

TKGWTT FVLHKTENS LNVKSL IMETGS VSKKVQQL PLS PRLSKNKHMRDMLLTMQKDS AY 

YETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSEHKYPSVFLL 

TPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTPVIIQGKDLT 

NRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEWTEAGEKASIVRRMFFDQSVPE 

LNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNGVEITKDMTV 

PLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDMLMKHLWPE 

MAGAYTLT I DEAPNTNE S GMLTNAKVS I H YVNGG VDKVDVP IKWDLEAI RKAEE AHKAD 

EARKAEEARKAEEAHKAEEVRKAEEAHKVEEAP . S . RGT . NPRSTYS . RRLQG . . RSSN . 

YYS . SV . FTKD . DSFRSSYG . NRQ . TDNFTSDTC 



SEQ ID NO. 5201 
STRAIN 090 

AG C GAT AC C T T T AAT T T T GAT AT T GAG C AAAT T G C AG A 
CAAT GC T AT C ACT AAAAC AG AT AAAAC AAC AG AAAT T AT T T C C AAC C AG A 
C AAC AAG C C AAAC T GGG C AAAT TGCCTTTTTT G AAAAAC T AAC ACC AGC A 
CAAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGT 
CGGCGATCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCG 
TT AAT ACCACT GT T AAT CATAT CTTGT CT GAGCAGAAAAAAATT CAAATT 
C C T C AAG T TG AT GAT T T ACT AAAAAAT G C T AAT CG CG AAC TAAAT GGAT T 
TAT T GC C AAAT AT AAAG AT G C T AC T C C G G C AG AAT T Ag AG AAAAAAC C AA 
ACT T G ATT C AAAAAT TAT T C AAAC AAAG C AAGAC CT CG C T AC AGGAAT T T 
TAT T T T G AC T C AC AAAAC AT C GAG C AAAAAAT GGAT AT GAT G GC a G C G AA 
T GT T GT C AAAC AAG AAG AT AC T T T GG C AAG AAAT AT CGtCTCTGCT GAAA 
T G C T CAT T G AAGAT AAT AC TAAAT C TAT T G AAAAT T T G GT T G G AGT TAT T 
GCTttTATTGAATCgAGTCAAGCCGAGGCTGCTAATCGtGCAaGCCACTT 
AC AAC AAGAAAT T C T AG CAT TAG AT AG C C a AACGT c C GAG TAT C AAAT t A 
AAAGT a AC CAAT TAG C T CGAAT G ACT G AAGT TAT C AAT AC CC T CG AAC AG 
C AAC AT AC T G AAT AT GT C AG C CGT C T C T AC G T T G CAT GGG C AAC AAC AC C 
AC AG AT G C G AAAC T T G GT C AAAGT AT C GT C AG AT AT G C GT C AG AAAC T T G 
G C AT GT T AC G T CG AAAT AC CAT T C C AAC AAT G AAAC T C T CAAT C G C T C AG 
TTAGGCATGATGCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTAT 
T G T C AACG C T AAT AAT G C AG CAT T G C AG AT G C T G G CT G AAAC T AGT AAAG 
AAG C GAT T C C G AT GT T AGAG AAG AC C G C AC AAAG C C C C AC T GT T T CT AT T 
AAAT C T GT C AC T GC AT T AG C T G AAAG CT TAG T GGC T C AAAAT AAT GG TAT 
TATCGCTGCCATAGACAAAGGACGTAAGGAACGTGCCCaATTGGAATCTG 
CTGTTATTAAATCGGCTGAAACAAT CAAT GAT TCTGTCAAAATT CGT GAT 
AAAAAAAT AGT T G AAG C C T TACT C AAC GAAG GT a AAT C T AC C C AAG AAAA 
AG T T GAT G AGT C T 

SEQ ID NO. 5202 
STRAIN A909 

AG CGAT AC C T T T AAT T T T GAT AT T G AC C AAAT T G C AG A 
CAAT G CT AT C ACT AAAAC AG AT AAAAC AAC AG AAAT T ATT T CC AAC C AG A 
CAACAAGCCAAACTGGGCAAATTGCCTTTTTTGAAAAACTAACACCAGCA 
CAAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGT 
CGGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCG 
T T AAT AC C AC T G T T AAT CATAT CT T G T CT GAG C AG AAAAAAAT T C AAAT T 
CCTCAAGTTGAT GAT T TACT AAAAAAT GCT AAT CGCGAACTAAATGGATT 
TAT T G C C AAAT AT AAAG AT G C T ACT C C G G C AG AAT TAG AG AAAAAAC CAA 
ACT T GAT T C AAAAAT TAT T C AAAC AAAGC AAG AC C T C G C T AC AG G AAT T T 
TAT T T T G AC T C AC AAAAC AT C G AG C AAAAAAT GGAT AT GAT G G C AG C G AA 
TGT TGTC AAAC AAGAAGAT ACT TT GGC AAGAAAT AT CGTCTCTGCT GAAA 
TGCTCATTGAAGATAATACTAAATCTATTGAAAATTTGGTTGGAGTTAwT 
GCTTTTATTGAATCGAGTCAAGCCGAGGCTGC CAAT CGT GCAAGCCACTT 
AC AAC AAG AAAT T C T AG CAT TAG AT AG C C AAAC GT C C G AGT AT C AAAT T A 
AAAGT AAC CAAT TAG CT C G AAT G AC T GAAG T TAT CAAT AC C CT C G AAC AG 
CAACATACTGAATATGTCAGCCGTCTCTACGTTGCATGGGCAACAACACC 
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AC AGATG CGAAAC T T G GT C AAAGT AT C GT C AG AT AT G C GT C AAAAACT T G 
GCATGTTACGTCGAAATACCATTCCAACaATGAAACTCTCAATCGCTCAG 
TTAGGCATGATGCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTAT 
T GT C AACGC T AAT AAT G C AG CAT T G C AG AT G C T G G C T G AAAC T AGT AAAG 
AAGCGATTCCGATGTTAGAGAAGACCGCACAAAGCCCCACTGTTTCTATT 
AAAT CT GT C ACT G CAT TAG C T G AAAG C T T AGT GG C T C AAAAT AAT G GT AT 
TATCGCTGCCATAGACAAAGGACGTAAAGAACGTGCCCAATTAGAATCTG 
CTGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTCGTGAT 
AAAAAAAT AG T T G AAG C CT T ACT C AAC G AAG GT a AAT CT AC C C AAG AAAA 
AGtTGATGAGTCT 

SEQ ID NO. 5203 
STRAIN H36B 

AGCGaTACCTTTAATTTTGATATTGACCAAATTGCAGAC 
AAT G C T AT C ACT AAAAC AG AT AAAAC AAC AGAAAT T AT T T C C AAC C AG AC 
AAC AAG C C AAAC T GG GC AAAT TGCCTTTTTT G AAAAACT AAC AC C AG C AC 
AAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGTC 
GGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGT 
T AAT ACCACTGT T AAT CAT AT CT TGTCT GAG C AG AAAAAAAT T CAAATT C 
CT C AAGT T GAT GAT T T ACT AAAAAAT G CT AAT CGC G AAC T AAAT GG AT T T 
AT T G C C AAAT AT AAAGAT G C T AC T C C G G C AGAAT T AG AG AAAAAAC C AAA 
C T T GAT T C AAAAAT TAT T C AAAC AAAG C AAG AC C T C G CT AC AGG AAT T T T 
AT T T T G AC T C AC AAAAC AT C GAG C AAAAAAT G GAT AT GAT G G C AGCG AAT 
GTTGT CAAACAAGAAGAT ACTT TGGCAAGAAAT AT CGT cT CT GCT GAAAT 
GCTCATTGAAGATAATACT AAAT CT ATT GAAAATTTGGTTGGAGTTATTG 
CTttTATTGAATCGAGTCAAGCCGAgGCTGCCAATCGTGCAAGCCACTTA 
CAACAAGAAATTCTAGCATTAGATAGCCAAACGTcCGAGTATCAAATTAA 
AAGT AAC C AAT TAG C T C G AAT G AC T G AAGT TAT C AAT AC C C T CGAAC AGC 
AAC AT AC T G AAT AT GT C AG C CGT CT CT ACGT T G CAT G GG C AAC AAC AC C A 
CAGATGCGAAACTTGGTCAAAGTATCGTCAGATATGCGTCAAAAACTTGG 
CAT GT T ACGT C GAAAT AC CAT T C C AAC a AT G AAAC T C T C AAT C G CT C AGT 
TAGGCATGATGCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTATT 
GTCAACGCTAATAATGCAGCATTGCAGATGCTGGCTGAAACTAGTAAAGA 
AG C GAT T C CG AT G T T AG AG AAGAC C GC AC AAAG C C C C AC T GT T T C T AT T A 
AAT CT G T C AC T G CAT TAT C T GAAAG C T T AGT GG C T C AAAAT AAT G GT AT T 
AT C G CT G C CAT AG AC AAAG GAC GT AAAG AACGT G C C C AAT TAG AAT C T GC 
TGTTATT AAAT CGGCTGAAACAAT CAAT GATT CTGTCAAAAT T CGTGAT a 
AAAAAAT AG T T G AAG C CT T AC T C Aa C GAAG GT a AAT C T AC C C AAG AAAAA 
GT T GAT G AGT C T 

SEQ ID NO. 5204 
STRAIN 18RS21 

T T T T GAT AT T G AC C AAAT T G C AG AC AAT GC TAT C AC T AAAAC AGAT AAAA 
C AAC AGAAAT TAT T T C C AAC C AG AC AAC AAG C C AAAC T GG GC AAAT T G C C 
TTT T TT GAAAAACT AACACCAGCACAAAAGT CT GCTAT CT CTGAAAAAAC 
ACCAGCTTTGGTAGATACTTTTGTCGGCGATCAAAATGCGCTCCTTGATT 
T T G GAC AAT C C GC AGT AGAAG G CG T T AAT ACC AC T G T T AAT CAT AT CT T G 
T C T GAG C AG AAAAAAAT T C AAAT T C C T C AAGT T GAT GAT T T ACT AAAAAA 
T G CT AAT CG C G AAC T AAAT GG AT T T AT T GC C AAAT AT AAAG AT G C T AC T C 
CGGCAGAATTAGAGAAAAAACCAAACTTGATTCAAAAATTATTCAAACAA 
AG C AAG AC CT C G C T AC AGG AAT T T TAT TTT G AC T C AC AAAAC AT C GAG C A 
AAAAAT G GAT AT GAT G G C AG C G AAT GTTGT CAAACAAGAAGAT AC T T T G G 
CAAG AAAT AT C GT CT CTGCTGAAATGCT CAT T GAAGAT AAT ACT AAAT CT 
ATTGAAAATTTGGTTGGAGTTATTGCTTTTATTGAATCGAGTCAAGCCGA 
GG C T G C T AAT C GT G CAAG C C AC T T AC AAC AAG AAAT T C TAG CAT TAG AT A 
G C C AAAC GT C C G AGT AT C AAAT T AAAAG T AAC CAAT TAG CT C G AAT GAC T 
G AAGT T AT CAAT AC C C T C G AAC AG C AAC AT C CT G AAT AT G T C AG C C G T CT 
CT ACGT TGCATGGGCAAC AACACCAC AGAT GCGAAACTTGGTCAAAGT AT 
CGTCAGATATGCGTCAGAAACTTGGCATGTTACGTCGAAATACCATTCCA 
ACAATGAAACTCTCAATCGCTCAGTTAGGCATGATGCAACAATCTGTCAA 
ATCCGGTGTCACTGCTGATGCTATTGTCAACGCTAATAATGCAGCATTGC 
AGAT G C T G G C T G AAAC TAG T AAAG AAG C GAT T C C GAT G T TAG AG AAG AC C 
GC ACAAAGCCCC ACT GTT TCT ATT AAAT CTGTC ACT GC ATT AGCT GAAAG 
C T T AGT G GC T C AAAAT AAT GGT AT TAT C GC T G C CAT AG AC AAAG G AC GT A 
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AGGAACGTGCCCaATTGGAATCTGCTGTTATTAAATCGGCTGAAACAATC 
AATGATTCTGTCAAAATTCGTGATAAAAAAATAGTTGAAGCCTTACTCAA 
C GAAGGT a AAT CT AC C C AAG AAAAAGT T GAT GAG T C T 

SEQ ID NO. 5205 
STRAIN M732 

AGCG AT AC CT T T AAT T T T GAT AT T G AC C AAAT T G C AG AC 
AAT G C TAT C ACT AAAAC AG AT AAAAC AAC AG AAAT TAT T T C C AAC C AG AC 
AAC AAG C C AAACT GG G C AAAT TGCCTTTTTT GAAAAAC T AAC AC C AG C AC 
AAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGTC 
GGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGT 
T AAT ACT ACTGT T AAT CAT AT CT T GT CT G AGC AGAAAAAAAT T CAAAT TC 
C T C AAGT T GAT GAT T T AC T AAAAAAT GCT AAT C G C G AAC T AAAT GG AT T T 
AT T G C C AAAT AT AAAG AT G CT AC T C C G G C AGAAT T AGAG AAAAAAC C AAA 
CT T GAT T C AAAAAT TAT T C AAAC AAAG C AAG AC C T C G CT AC AG G AAT T T T 
AT T T T GAC T C AC AAAAC AT C GAG C AAAAAAT G GAT AT G AT GG C AG CAAAT 
G T T G T C AAAC AAG AAG AT ACT T T G G C AAG AAAT AT CGTCTCTGCT G AAAT 
GCTCATTGAAGATAATACTAAATCTATTGAAAATTTGGTTGGAGTTATTG 
CTTTTATTGAATCGAGTCAAGCCGAGGCTGCCAATCGTGCAAGCCACTTA 
C AAC AAGAAAT T CT AG CAT T AGAT AG C C AAAC GT C C GAAT AT CAAAT T AA 
AAGT AAC C AAT TAG C C CGAAT G AC T G AAGT TAT C AAT ACC C T C G AAC AGC 
AAC AT ACGG AAT AT GT C AG C C GT C T C T AC GT T G C AT GG GC AAC AAC AC C A 
CAGATGCGAAACTTGGTCAAAGTATCGTCAGATATGCGTCAGAAACTTGG 
TAT GT T AC GT C G AAAT AC CAT T C C AAC AAT G AAAC T C T C AAT CGC T C AGT 
T AGG CAT GAT G C AAC AAT C T GT CAAAT C C GGT GT C AC T G CT G AT G CT AT T 
G T C AAC G C T AAT AAT G C AG CAT T G C AAAT GC T G G C T G AAAC T AGT AAAG A 
AGCGATTCCGATGTTAGAGAAGACCGCACAAAGCCCCACTGTTTCTATTA 
AATCTGTCACTGCATTAGCTGAAAGCTTAGTGGCTCAAAATAATGGTATT 
AT CG CT GC CAT AG AC AAAGG AC GT AAG G AAC G T G C C C AAT TAG AAT C T G C 
TGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTCGTGATA 

AAAAAAT AGT T GAAGC CTT ACT CAACGAAGGT AAAT CT AC CCAAGAAAAA 
G 

SEQ ID NO. 5206 
STRAIN COH1 

C T AAAAC AG AT AAAAC AAC AG AAAT TAT T T C C AAC C AG AC AAC AAG C C AA 
AC T G G G CAAAT TGCCTTTTTT G AAAAACT AAC AC C AG C AC AAAAGT C T G C 
T wT CT CTG AAAAAAC ACC AGCTTT GGT AGAT ACT TTTGTCGGTG ACC AAA 
ATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGTTAATACTACT 
GTT AAT CAT AT CT TGTCTGAGC AGAAAAAAAT TC AAAT TCCTCAAGTTG A 
T GAT T TACT AAAAAAT G CT AAT CGC G AAC T AAAT G GAT T TAT T G C CAAAT 
AT AAAGAT G C T AC T C CGG C a GAAT TAG AG AAAAAAC C AAAC T T GAT T C AA 
AAAT TAT T C AAAC AAAG C AAG AC C T CG CT AC AGG AAT T T TAT T T T GAC T C 
AC AAAAC AT C GAG C AAAAAAT GG AT AT GAT GG C AG C AAAT GT T GT C AAAC 
AAGAAG AT ACT T T GG C AAG AAAT AT CGTCTCTGCT G AAATG CT CAT T G AA 
GATAATACTAAATCTATTGAAAATTTGGTTGGAGTTATTGCTTTTATTGA 
ATCGAGTCAAGCCGAgGCTGCCAATCGTGCaAGCCACTTACAACAaGAAA 
T T CT AG C a T T AGAT AG C C AAACGT C C GAAT AT C AAAT T AAAAGT AAC C AA 
T TAG C C C GAAT GAC T GAa GT TAT C Aa T a C C C T C G AAC AG C AAC AT AC G G A 
a T AT GT C AG C C G T CT C T AC GT T GC AT GGG C AAC AAC AC C AC AG AT G CG AA 
ACT T G G T C AAAGT AT C GT C AG AT AT G C GT C AG AAAC T T G GT AT G T T AC GT 
CGAAATACCATTCCAACAATGAAACTCTCAATCGCTCAGTTAGGCATGAT 
GCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTATTGTCAACGCTA 
AT AAT GC AG C AT T G CAAAT G C T GGC T G AAACT AGT AAAG AAG CG AT T C C G 
AT G T TAG AG AAG AC C G C AC AAAG C C CC ACT GT T T C TAT T AAAT C T G T C AC 
T G C AT TAG C T G AAAG C T TAG T GG C T C AAAAT AAT G GT AT T AT C G CT G C C A 
TAGACAAAGGACGTAAGGAACGTGCCCAATTAGAATCTGCTGTTATTAAA 
T CGGCT G AAAC AAT C AAT GATTCTGTC AAAAT TCGTG AT AAAAAAAT AGT 
T G AAG C C T T AC T C Aa C G AAG GT AAAT C T AC C C AAG AAAAAG T T GAT G AG T 
CT 

SEQ ID NO. 5207 
STRAIN M781 

T T T T GAT AT T GAC CAAAT T G C AG AC AAT G CT AT C AC T AAAAC AG AT AAAA 



213 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



C AAC AG AAAT TAT T T C C AACC AG AC AAC AAG C C AAAC T GGGC AAAT T GC C 
TTTTTTGAAAAACTAACACCAGCACAAAAGTCTGCTATCTCTGAAAAAAC 
ACCAGCTTTGGTAGATACTTTTGTCGGTGACCAAAATGCGCTCCTTGATT 
T T GG AC AAT C C G C AGT AG AAG G C G T T AAT AC TAG T G t T AAT CAT AT C T T G 
T CT G AG C AG AAAAAAAT T C AAAT T C C T C AAG T T GAT GAT T T ACT AAAAAA 
TGCTAATCGCGAACTAAATGGATTTATTGCCAAATATAAAGATGCTACTC 
C GG C AGAAT T AGAGAAAAAAC C AAACT T GAT T C AAAAAT TAT T C AAAC AA 
AG C AAG AC CT C G CT AC AG G AAT T T TAT T T T G AC T C AC AAAAC AT C GAG C A 
AAAAAT G GAT AT GAT GG C AG C AAAT GT T GT C AAAC AAG AAG AT ACT T T GG 
C AAG AAAT AT CGTCTCTGCT G AAAT GC T CAT T GAAGAT AAT ACT AAAT C T 
ATTGAAAATTTGGTTGGAGTTATTGCTTTTATTGAATCGAGTCAAGCCGA 
G GC T GC C AAT C G T G C AAG C C AC T T AC AAC AAGAAAT T C T AG CAT TAG AT A 
G C C AAACGT C C G AAT AT C AAAT T AAAAGT AAC C AAT TAG C C CG AAT G ACT 
G AAGT T AT C AAT AC C C T C GAAC AG C AAC AT ACGG AAT AT GT C AGC C GT C T 
C T AC GT T G CAT G GG C AAC AAC AC C AC AG AT G C G AAAC T TG GT C AAAGT AT 
C GT C AG AT AT G C GT C AG AAAC T T GGT AT GT T AC GT C G AAAT AC CAT T C C A 
AC AAT GAAACT CT C AAT C GC T C AG T TAG GC AT GAT G C AAC AAT CT G T C AA 
ATCCGGTGTCACTGCTGATGCTATTGTCAACGCTAATAATGCAGCATTGC 
AAAT GC T GG C T GAAAC T AGT AAAGAAG CG AT T C C G AT GT T AG AGAAG AC C 
GCACAAAGCCCCACTGTTTCTATTAAATCTGTCACTGCATTAGCTGAAAG 
CTTAGTGGCTCAAAATAATGGTATTATCGCTGCCATAGACAAAGGACGTA 
AG GAAC GT GC C C AAT TAG AAT C T G C T GT TAT T AAAT C GG CT GAAAC AAT C 
AAT GAT T C T GT C AAAAT T C GT GAT AAAAAAAT AGT T G AAGC C T T AC T C AA 
CGAAGGTAAATCTACCCAAGAAAAAGTTGATGAGTCT 

SEQ ID NO. 5208 
STRAIN CJB110 

T T T T GAT AT T G AC C AAAT T G C AG AC AAT G CT AT C AC T AAAAC AG AT AAAA 
C AAC AG AAAT TAT TT C C AAC C AG AC AAC AAG C C AAAC T G GGC AAAT T G C C 
TTTTTTGAAAAACTAACACCAGCACAAAAGTCTGCTATCTCTGAAAAAAC 
ACCAGCTTTGGTAGATACTTTTGTCGGCGATCAAAATGCGCTCCTTGATT 
T T G GAC AAT C C G C AG T AG AAG GC GT T AAT AC C AC T G T T AAT CAT AT C T T G 
TCTGAGCAGAAAAAAATTCAAATTCCTCAAGTTGATGATT TACT AAAAAA 
TGCTAATCGCGAACTAAATGGATTTATTGCCAAATATAAAGATGCTACTC 
CGGCAGAATTAGAGAAAAAACCAAACTTGATTCAAAAATTATTCAAACAA 
AG C AAG AC C T C G C T AC AG G AAT T T TAT T T T GAC T C AC AAAAC AT CG AG C A 
AAAAAT GG AT AT GAT GG C AGC G AAT GT T GT C AAAC AAG AAG AT ACT T T G G 
C AAGAAAT AT CGTCTCTGCT G AAAT G C T CAT T GAAGAT AAT ACT AAAT CT 
AT T G AAAAT T T GGT T GG AGT TAT T G C T T T TAT T G AAT CG AGT C AAG C CG A 
GG CT G C T AAT C GT G C AAG C C AC T T AC AAC AAG AAAT T C T AGC AT TAG AT A 
G C C AAAC GT C CG AGT AT C AAAT T AAAAG T AAC C AAT T AG C T C G AAT GAC T 
G AAG T T AT C AAT AC C CT C GAAC AG C Aa CAT AC T G AAT AT G T C AG C C GT C T 
CTACGTTGCATGGGCaACaACACCACAGATGCGAAACTTGGTCAAAGTAT 
CGTCAGATATGCGTCAGAAACTTGGCATGTTACGTCGAAATACCATTCCA 
AC AAT GAAAC T CT C AAT C GC T C AG T T AG G CAT GAT G C AAC AAT C T GT C AA 

ATCCGGTGTCACTGCTGATGCTATTGTCAACGCTAATAATGCAGCATTGC 
AG AT G C T G G C T g AAAC T AGT AAAG AAGC GAT T C C GAT GT TAG AG AAG AC C 
GC AC AAAGCCC C ACT GT TTCT ATT AAAT CTGTC ACT GC ATT AG CTGAAAG 
CTTAGTGGCTCAAAATAATGGTATTATCGCTGCCATAGACAAAGGACGTA 
AGGAaCGTGCCCAATTGGAATCTGCTGTT ATT AAAT CGGCT GAAAC AAT C 
AAT GAT T C T GT C AAAAT T C GT GAT a AAAAAAT AGT T G AAG C C T T AC T C AA 
C G AAG GT AAAT C T AC C C AAG AAAAAGT T GAT G AGT CT 

SEQ ID NO. 5209 
STRAIN 1169NT 

GCAGAC AATG CT AT C ACT AAAAC AG AT AAAAC AAC AG AAAT T ATT T CC AA 
CCAGACAACAAGCCAAACTGGGCAAATTGCCTTTTTTGAAAAACTAACAC 
CAGCACAAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACT 
TTTGTCGGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGA 
AGGCGT T AAT AC CACTGTT AAT CAT AT CTTGTCTG AGC AG AAAAAAAT TC 
AAAT T C C T C AAG T T GAT GAT T T AC T AAAAAAT G C T AAT C G C GAAC T AAAT 
G GAT T T AT T G C C AAAT AT AAA GAT G C T AC T C C G G C AG AAT T AG AG AAAAA 
ACCAAACTTGATCCAAAAATTATTCAAACAAAGCAAGACCTCACTACAGG 
AAT T T T AT T T T GAC T C AC AAAAC AT C GAG C AAAAAAT G GAT AT GAT G G C A 
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G C AAAT GT T G T C AAAC AAGAAG AT AC T T T G GC AAG AAAT AT CGTCTCTGC 
T GAAAT G CT CAT T G AAG AT AAT AC T AAAT CT AT T GAAAAT T T GGT T GG AG 

TTATTGCTTTTATTGAATCGAGTCAAGCCGAGGCTGCCAATCGTGCAAGC 
C AC T TAG AAC AAG AAAT T C T AGC AT T AGAT AG C C AAAC GT C CG AGT AT C A 
AAT T AAAAGT AAC C AAT TAG C T C GAAT GACT GAAGT T AT C AAT AC C C T C G 
Aa C AG C AAC AT ACT G AAT AT GT C AG C C GT C T CT AC GT T G C AT G G G C AAC A 
a CAC C AC AG AT G C G AAACT T GGT C AAAGT AT C GT C AG AT AT G CGT C AAAA 

ACTTGGCATGTTACGTCGAAATACCATTCCAACAATGAAACTCTCAATCG 
C T C AGT T AGGC AT GAT G C AAC AAT C T G T C AAAT C C G G T G T CACT GC T GAT 

GCTATTGTCAACGCTAATAATGCAGCATTGCAGATGCTGGCTGAAACTAG 
T AAAGAAG CG AT T C C G AT GT T AG AG AAGAC C G C AC AAAG C C C CAC T G T T T 
CT AT T AAAT C T G T CAC T G CAT T AG CT G AAAG C T T AGT GG C T C AAAAT AAT 
GGTATTATCGCTGCCATAGACAAAGGACGTAAGGAACGTGCCCAATTAGA 
ATCTGCTGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTC 
G T GAT AAAAAAAT AG T T GAAGC CT T ACT C AAC G AAGGT a AAT CT AC C C AA 
GAAAAAG T T GAT GAGT C T 

SEQ ID NO. 5210 
STRAIN JM9130013 

AG C GAT AC C T T T AAT T TT GAT AT T G AC C AAAT T G C AG AC 

AAT G C TAT CACT AAAAC AGAT AAAAC AAC AGAAAT TAT T T C C AAC C AGAC 

AACAAGCCAAACTGGGCAAATTGCCTTTTTTGAAAAACTAACACCAGCAC 
AAAAG T C T G CT AT C T CT GAAAAAAC AC C AG CT T T GGT AGAT AC T T T T GT C 

GGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGT 
TAATACCACTGTTAATCATATCTTGTCTGAGCAGAAAAAAATTCAAATTC 
C T C AAGT T GAT GAT T T ACT AAAAAAT G C T AAT C G C G AAC T AAAT GG AT TT 
AT T GC C AAAT AT AAAG AT G C T AC T C C G G C AG AAT T AG AG AAAAAAC C AAA 

CTTGATTCAAAAATTATTCAAACAAAGCAAGACCTCGCTACAGGAATTTT 
AT T T T GACT CAC AAAAC AT CGAG CAAAAAAT G GAT AT GAT GG C AG C GAAT 
GT T GT C AAAC AAGAAGAT AC T T T G G C AAGAAAT AT C GT C T C T GC TGAAAT 
G C T CAT T GAAGAT AAT ACT AAAT CT AT T GAAAAT T T GGT T G GAGT TAT T G 
CTTTTATTGAATcGAGTCAAGCCGAGGCTGCCAATCGTGCAAGCCACTTA 
C AAC AAGAAATTCT AGC ATT AGAT AGCCAAACGTCCGAGT AT C AAAT tAA 
AAG T a AC C AAT TAG C T C GAAT G AC T GAAGT TAT C AAT AC C C T C G AAC AG C 
AAC AT AC T G AAT AT GT C AG C CGT C T C T AC GT T G CAT G G G C AAC AAC AC C A 
C AG AT G C G AAAC T T GGT C AAAG TAT CGT C AG AT AT GCGT C AAAAAC T T GG 
C AT GT T AC G T CGAAAT AC CAT T C C AAC AAT GAAAC T C T C AAT C GC T C AGT 

T AGGC ATGATGCAAC AAT CTGTC AAAT CCGGTGTCACTGCTGATGCT ATT 
G T C AACG CT AAT AAT G C AG CAT T G C AG AT GC T G G C T G AAACT AGT AAAGA 
AGCGAT T C C G AT GT T AGAGAAGACC G CAC AAAG C C C CAC T G T T T C T AT T A 

AATCTGTCACTGCATTAGCTGAAAGCTTAGTGGCTCAAAATAATGGTATT 
AT C G C T G C CAT AGAC AAAG Ga C G T AAGG AAC GTG C CC AAT T AGAAT C T G C 
T GT TAT T AAAT CG G C T GAAAC AAT C AAT G AT T CT GT C AAAAT T C GTG AT A 
AAAAAAT AGT T GAAG C C T TACT C AAC GAAG GT a AAT C T AC C C AAG AAAAA 
GT T GAT GAGT C T 

SEQ ID NO. 5211 
STRAIN 2603 

agcgatacctttaattttgatattgaccaaattgcagacaatgctatcac 
taaaacagataaaacaacagaaattatttccaaccagacaacaagccaaa 
ctgggcaaattgccttttttgaaaaactaacaccagcacaaaagtctgct 
atctctgaaaaaacaccagctttggtagatacttttgtcggcgatcaaaa 
tgcgctccttgattttggacaatccgcagtagaaggcgttaataccactg 
ttaatcatatcttgtctgagcagaaaaaaattcaaattcctcaagttgat 
gatttactaaaaaatgctaatcgcgaactaaatggatttattgccaaata 
taaagatgctactccggcagaattagagaaaaaaccaaacttgattcaaa 
aattattcaaacaaagcaagacctcgctacaggaattttattttgactca 
caaaacatcgagcaaaaaatggatatgatggcagcgaatgttgtcaaaca 
agaagatactttggcaagaaatatcgtctctgctgaaatgctcattgaag 
ataatactaaatctattgaaaatttggttggagttattgcttttattgaa 
tcgagtcaagccgaggctgctaatcgtgcaagccacttacaacaagaaat 
tctagcattagatagccaaacgtccgagtatcaaattaaaagtaaccaat 
tagctcgaatgactgaagttatcaataccctcgaacagcaacatcctgaa 
tatgtcagccgtctctacgttgcatgggcaacaacaccacagatgcgaaa 
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cttggtcaaagtatcgtcagatatgcgtcagaaacttggcatgttacgtc 
gaaataccattccaacaatgaaactctcaatcgctcagttaggcatgatg 
caacaatctgtcaaatccggtgtcactgctgatgctattgtcaacgctaa 
taatgcagcattgcagatgctggctgaaactagtaaagaagcgattccga 
tgttagagaagaccgcacaaagccccactgtttctattaaatctgtcact 
gcattagctgaaagcttagtggctcaaaataatggtattatcgctgccat 
agacaaaggacgtaaggaacgtgcccaattggaatctgctgttattaaat 
cggctgaaacaatcaatgattctgtcaaaattcgtgataaaaaaatagtt 
gaagccttactcaacgaaggtaaatctacccaagaaaaagttgatgagtc 
t 

SEQ ID NO. 5212 

STRAIN _0 90 frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTTOHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
T P AE LEKKPN L I QKL FKQ S KT S LQE FY FD S QN I EQKMDMMAAN WKQE DT LARN I VS AEM 

LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHTEYVSRLWAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQS VKSGVTADAIVNANNAALQMLAETS KEAI PMLEKTAQS PT VS IKS VTALAE SLVAQN 
NGIIAAIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 52013 

STRAIN A909 frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
T PAE LEKKPNL I QKLFKQS KT S LQE FYFDS QN IE QKMDMMAAN WKQE DTLARN I VS AEM 
LIE DNT K S I EN L VG VX AFI ESS Q AE AANRAS H L Q QE I L AL D S Q T S E YQ I K S NQL ARMT E V 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQSVKSGVTADAIVNANNAALQMLAETSKEAIPMLEKTAQSPTVSIKSVTALAESLVAQN 
NGIIAAIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5214 

STRAIN H3 6B frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQ S VK S G VT AD A I VN ANN AAL QML AE T S KE AI PM L E KT AQ S P T V S I K S VT AL S E S L VAQN 
NGIIAAI DKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5215 

STRAIN 18RS21 frame: 2 

FD I DQ I ADNAI TKT DKTTE 1 1 SNQTT SQTGQI AFFEKLT PAQKS AI SEKT PALVDT FVGD 
QNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAEL 
EKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEMLIEDN 
TKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLE 
QQHPEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGiyiMQQSVK 
SGVTADAIVNANNAALQMLAETSKEAIPMLEKTAQSPTVSIKSVTALAESLVAQNNGIIA 
AIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5216 

STRAIN M732 frame: 1 

S DT FNFD I DQI ADNAITKT DKTTE 1 1 SNQTTSQTGQI AFFEKLT PAQKSAI SEKT PALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQSVKSGVTADAIVNANNAALQMLAETSKEAI PMLEKT AQ S PT V S I KS VT ALAE S L VAQN 

NGIIAAI DKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEK 

SEQ ID NO. 5217 

STRAIN COH1 frame: 3 

KTDKTTEIISNQTTCQTGQIAFFEKLTPAQKSAXSEKTPALVDTFVGDQNALLDFGQSAV 
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EGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAELEKKPNLIQKLFK 
QSKTSLQEFYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEMLIEDNTKSIENLVGVIA 
FIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLEQQHTEYVSRLYV 
AWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQSVKSGVTADAIVNAN 
NAALQMLAET SKEAI PMLE KTAQS PT VS IKS VTALAE S LVAQNNG 1 1 AAI DKGRKERAQL 
ESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5218 

STRAIN COH1 frame: 3 

KTDKTTEIISNQTTCQTGQIAFFEKLTPAQKSAXSEKTPALVDTFVGDQNALLDFGQSAV 
EGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAELEKKPNLIQKLFK 
QSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEMLIEDNTKSIENLVGVIA 
FIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLEQQHTEYVSRLYV 
A WAT T P QMRN L VKV S S DMR QKLGMLRRN TIPTMKLS I AQL GMMQ Q S VK S G VT AD AI VN AN 

NAALQMLAET SKEAI PMLEKTAQSPTVSIKSVTALAESLVAQNNGIIAAI DKGRKERAQL 
ESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5219 

STRAIN M7 81 frame: 2 

FDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVDTFVGD 
QNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAEL 
EKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEMLIEDN 
TKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLE 
QQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQSVK 
SGVTADAIVNANNAALQMLAETSKEAIPMLEKTAQSPTVSIKSVTALAESLVAQNNGIIA 
AIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5220 

STRAIN CJB110 frame: 2 

FDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKT PALVDTFVGD 
QNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAEL 
EKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEMLIEDN 
TKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLE 
QQHTE YVSRLYVAWATT PQMRNLVKVS S DMRQKLGMLRRNT I PTMKL S I AQLGMMQQS VK 
S G VT AD AI VN ANN AAL QM L AE T S KE A I PM L E KT AQ S PT V S I K S VTALAE S LVAQNNG 1 1 A 
AIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5221 

STRAIN 1169NT frame: 1 

ADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVDTFVGDQNALLD 
FGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAELEKKPNL 
I QKL FKQ S KT S L QE F Y FD S QN I E QKM DMMAAN WKQE D T L ARN I V S AEML IEDNTKSIEN 

LVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLEQQHTEY 
VS RLYVAWAT T PQMRNLVKVS S DMRQKLGMLRRNT I PTMKL S I AQLGMMQQ S VKS GVT AD 
AI VN ANN AALQML AE T S KE AI PMLEKTAQS PTVS IKS VTALAE S LVAQNNG I IAAI DKGR 
KERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5222 

STRAIN JM9130013 frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQE FYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INT LEQQHTE YVSRLYVAWATT PQMRNLVKVS S DMRQKLGMLRRNT I PTMKLS I AQLGMM 

QQ S VKS GVT ADAIVNANNAALQMLAET SKEAI PMLEKTAQS PTVS I KSVTALAESLVAQN 
NGIIAAI DKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5223 

STRAIN 2 603 frame: 1 

S DT FNFD I DQIADNAITKT DKTTE 1 1 SNQTTSQTGQIAFFEKLT PAQKS AI SEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHPEYVSRLYVAWATT PQMRNLVKVS S DMRQKLGMLRRNTI PTMKLS IAQLGMM 
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QQS VKSGVTADAIVNANNAALQMLAJ3TSKEAI PMLEKTAQSPTVSIKSVTALAESLVAQN 
NG 1 1 AAI DKGRKE RAQLE S AV I K S AE T I N D S VK I R DKK I VEAL LNE GKS T QEKV D E S 

SEQ ID NO. 5301 
STRAIN 2603 

acaaatactttgaaaaaagaattagttgaagctaaaaagacaattccatc 
cgtaaaagcttcaaaagtaccgcaaaaatcaacatcatcgaaagataaag 
agtttgttcttaaaccgattatcgatgtctctggttggcaacttcctaag 
gagattgattacgatacgctttcaaaaaatatttcaggtgttgttattcg 
tgtctttggtggatcaaagatatctaagactaataacgctgcttatacaa 
ctggaatcgataaatcgtttaagacccatatcaaagaatttcaaaagcga 
aatatcccagtagctgtctacagttatgcacttggttcaagtgttaaaga 
aatgaaagaagaggctcagatattttataagaatgcagctccttacaaac 
caactttttattggattgacgtagaagaggagac aatgtctaacatgaat 
aaaggtgtccaagcattccgaaaagaattaaaaagacttggtgctaaaaa 
tgttggtatctacattggtacttactttatgactgagcaaggcatctctg 
taaaaggatttgacgctgtttggattccaacttatggtagcgattctgga 
tactatgaagcggctccgcaaactgaacttaaatacgatttacaccaata 
cacctctcaaggttatctaccaggawtcaatcaaccgcttgatttaaatc 

aaattgcagttaataaagacaagaagaaaacttatgagaaactttttgga 
aaagtaaaagag 

SEQ ID NO. 5302 
STRAIN 090 

ACAAATACTTTGAAAAAAGAATTAG 

T T GAAG C T AAAAAG AC AAT T C CAT C C GT AAAAG C T T C AAAAGT AC CG C AA 
AAAT C AACAT CAT C GAAAGAT AAAG AGT T T GT T CT T AAAC C GAT TAT C GA 
TGTCTCTGGTTGGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAA 
AAAATATTTCAGGTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCT 
AAGAC T AAT AACG C T G C T T AT AC AAC T G G AAT C GAT AAAT CG T T T AAG AC 
C CAT AT C AAAG AAT T T C AAAAG CG AAAT AT C C C AG T AG CT GT C T ACAGT T 
AT G C AC T T G GT T C AAGT GT T AAAG AAAT G AAAG AAG AG G C T C AGAT AT T T 

TATAAGAATGCAGCTCCTTACAAACCAACTTTTTATTGGATTGACGTAGA 
AG AGG AG AC AAT GT CT AAC AT G AAT AAAGGT GT C C AAG CAT T C C GAAAAG 

AATTAAAAAGACTTGGTGCTAAAAATGTTGGTATCTACATTGGTACTTAC 
TTTATGACTGAGCAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGAT 
TCCAACTTATGGTAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTG 
AAC T T AAAT AC GAT T T AC AC C AAT AC AC C T C T C AAGGT T AT C T AC C AGG A 
T T C AAT C AAC C G C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAGAA 
GAAAACTTATGAGAAACTTTTTGGAAAAGTAAAAGAG 

SEQ ID NO. 5303 
STRAIN A909 

AC AAAT ACT T T G AAAAAAGAAT T AGT T G AAG CT AAAA 

AG AC AAT T C CAT C C G T AAAAG C T T C AAAAGT AC C G C AAAAAT C AAC AT C A 

T CGAAAGAT AAAG AGT T T GT T C T T AAAC C GAT TAT C G AT GT CTCTGGTTG 

GCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAAAAAATATTTCAG 
G T GT T GT T AT TCGTGTCTTTGGT GG AT C AAAG AT AT C T AAG AC T AAT AAC 
G C T G C T TAT AC AAC T GG AAT C GAT AAAT C GT T T AAGAC C CAT AT C AAAG A 

ATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGTT 
C AAGT GT T AAAG AAAT G AAAG AAG AG G C T C AG AT AT T T TAT AAG AAT G C A 
G C T C C T T AC AAAC C AAC T T T T T AT T G GAT T GACGT AG AAG AG GAGAC AAT 
G T C T AAC AT G AAT AAAG G T G T C C AAG CAT T C CG AAAAG AAT T AAAAAG AC 
TTGGTGCTAAAAATGTTGGTATCTACATTGGTACTTACTTTATGACTGAG 
CAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGG 
TAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTGAACTTAAATACG 
AT T TAG AC C AAT AC AC C T C T C AAGG T TAT CT AC C AG GAT T C AAT C AAC C G 
C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAGAC AAG AAG AAAAC T TAT G A 
G AAAC T T T T T G G AAAAGT AAAAG A G 

SEQ ID NO. 5304 
STRAIN H36B 

ACAAATACTTTGAAAAAAGAATTAG 

T T GAAG C T AAAAAG AC AAT T C CAT C C G T AAAAG C T T C AAAAGT AC C GCAA 
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AAAT C AAC AT CAT C GAAAGAT AAAG AGT T T G T T C T T AAAC C GAT TAT C G A 
TGTCTCTGGTTGGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAA 
AAAATATTTCAGGTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCT 
AAG AC T AAT AAC G C T G C T T AT AC AAC T GGAAT C G AT AAAT C G T T T AAGAC 
CCAT AT CAAAGAATT TCAAAAG CGAAAT AT CCCAGT AGCTGT CTACAGTT 
AT G C ACT T G G T T C AAGT GT T AAAG AAAT G AAAGAAGAGGCT C AGAT AT T T 
TAT AAG AAT G C AG C T C CT T AC AAAC C AAC T T T T TAT T GG AT T G AC GT AGA 
AGAG GAG AC AAT GT C T AAC AT G AAT AAAGGT G T C C AAG CAT T C CG AAAAG 
AAT TAAAAAG ACTT GGT GCT AAAAAT GT TGGT AT CT ACATT GGTACTT AC 
TTTATGACTGAGCAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGAT 
TCCAACTTATGGTAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTG 
AACT T AAAT AC GAT T T AC AC C AAT AC AC CT C T C AAGGT T AT CT AC C AG GA 
TT C AAT CAACCGCT T GATTT AAAT CAAATTGCAGT T AAT AAAGACAAGAA 
G AAAAC T TAT GAG AAAC T T T T T G GAAAAGT AAAAG AG 

SEQ ID NO. 5305 
STRAIN 18RS21 

AC AAAT AC T T T G AAAAAAGAAT T AGT T G AAG CT AAAAA 
G AC AAT T C C AT C CG T AAAAGC T T C AAAAGT AC C G C AAAAAT C AAC AT CAT 
CGAAAGATAAAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTTGG 
C AAC T T C CT AAGG AG AT T GAT T AC G AT ACG CT T T CAAAAAAT AT T T C AGG 
T GT T G T TAT TCGTGTCTTTGGT GG AT C AAAG AT AT C T AAGAC T AAT AAC G 
C T G CT T AT AC AAC T GGAAT C GAT AAAT C GT T T AAG AC C CAT AT C AAAG AA 
TTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGTTC 
AAG T GT T AAAG AAAT GAAAGAAG AG GCT C AG AT AT T T TAT AAG AAT G C AG 
C T C C T T AC AAAC C AAC T T T T TAT T GG AT T G AC G T AG AAGAG GAG AC AAT G 
T C T AAC AT GAAT AAAGGT GT C C AAG CAT T C C G AAAAG AAT TAAAAAG AC T 
TGGTGCTAAAAATGTTGGTATCTACATTGGTACTTACTTTATGACTGAGC 
AAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGGT 
AG C GAT T CT GGAT AC T AT G AAG C GG C T C CG C AAAC T G AAC T T AAAT AC GA 
T T TAG AC C AAT AC AC CT C T C AAGGT TAT C T AC C AG GAT T C AAT C AAC CG C 
T T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAG AAG AAAACT TAT GAG 
AAAC T T T T T G GAAAAGT AAAAG AG 

SEQ ID NO. 5306 
STRAIN M732 

AC AAAT ACT T TG AAAAAAGAAT T AGT T G AAG C T AAA 

AAG AC AAT T C CAT C C GT AAAAG C T T C AAAAGT AC C G C AAAAAT C AAC AT C 
ATCGAAAGATAAAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTT 
G GC AAC T T C C T AAGG AG AT T GAT T ACG AT AC G C T T T CAAAAAAT AT T T C A 
GGTGTTGTTATTCGTATCTTTGGTGGATCAAAGATATCTAAGACTAATAA 
C G CT GC T T AT AC AAC T G GAAT C GAT AAAT C GT T T AAG AC C CAT AT C AAAG 

AATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGT 
T C AAG T GT T AAAG AAAT GAAAGAAG AG GCT C AGAT AT T T TAT AAG AAT G C 
AG CT C CT T AC AAa C C AACT T T T TAT T G GAT T G AC GT AG AAG AGG AGAC AA 
T GT CT AAC AT GAAT AAAGGT GT C C AAG CAT T C CG AAAAG AGT T AAAAAGA 
CTTGGTGCTAAAAATGTTGGTATCTACATCGGTACTTACTTTATGACTGA 
GCAAGGTATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATG 
GTAGCGATTCTGGATACTATGAAGCAGCTCCACAAACTGAACTTAAATAC 
GAT T T AC AC C AAT AC AC C T C T C AAG GT T AT C T AC C AG GAT T C AAT C AAC C 
GCT T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAG AAG AAAACT TAT G 
AGAAAC T T T T T G GAAAAGT AAAAG AG 

SEQ ID NO. 5307 
STRAIN COH1 

ACAAATACTTTGAAAAAAGAATTAGTTGAAGCTAAAA 

AG AC AAT T C CAT C C G T AAAAG C TT C AAAAG TAG C G C AAAAAT C AAC AT C A 

TCGAAAGATAAAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTTG 

GC AAC TT C C T AAG GAG AT T GAT T ACG AT AC G CT T T CAAAAAAT AT T T C AG 

GTGTTGTTATTCGTATCTTTGGTGGATCAAAGATATCTAAGACTAATAAC 
G C T G C T TAT AC AAC T GGAAT C GAT AAAT CG T T T AAG AC C CAT AT C AAAG A 

ATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGTT 
C AAG T GT T AAAG AAAT GAAAGAAG AG GCT C AG AT AT T T TAT AAG AAT G C A 
G C T C C T T AC AAAC C AAC T T T T TAT T G GAT T GACGT AG AAG AGG AG AC AAT 
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GTCTAACATGAATAAAGGTGTCCAAGCATTCCGAAAAGAGTTAAAAAGAC 
TTGGTGCTAAAAATGTTGGTATCTACATCGGTACTTACTTTATGACTGAG 
CAAGGTATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGG 
TAGCGATTCTGGATACTATGAAGCAGCTCCACAAACTGAACTTAAATACG 
AT T T AC ACC AAT AC AC C T C T C AAGGT T AT CT AC C AG GAT T C AAT C AAC CG 
C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAGAAG AAAAC T TAT G A 
G AAAC T T T T T G G AAAAG T AAAAG AG 

SEQ ID NO. 5308 
STRAIN M781 

AC AAAT AC T T T G AAAAAAG AAT T AG T T G AAGC T AAA 

AAG AC AAT T C CAT C c GT AAAAG C T T C AAAAGT AC C G C AAAAAT C AAC AT C 
AT CGAAAGAT AAAG AGT T T GT T C T T AAAC C GAT T AT CG AT GT C T C T GGT T 
GGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAAAAAATATTTCA 
GGTGTTGTTATTCGTATCTTTGGTGGATCAAAGATATCTAAGACTAATAA 
CGCTGCTT AT ACAACTGGAAT C GAT AAAT cGTTTAAGACCC AT AT CAAAG 
AAT T T C AAAAG CGAAAT AT C C C AGT AG C T GT CT AC AG T T AT G C AC T T GGT 
T C AAG T G T T AAAGAAAT G AAAG AAG AGG C T C AGAT AT T T T AT AAGAAT G C 
AG C T C C T T AC AAAC C AAC TTTTTatTG GAT T G ACGT AGAAGAG GAG a C AA 
T G T CT AACAT G AAT AAAGGT GT C C AAG CAT T C C G AAAAG AGT T AAAAAG A 
CTTGGTGCTAAAAATGTTGGTATCTACATCGGTACTTACTTTATGACTGA 
GCAAGGTATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATG 
GT AGCG AT T C T GG AT AC TAT GAAG C AG C T C C AC AAAC T GAAC T T AAAT AC 
GAT T T AC AC C AAT AC AC C T CT C AAG GT TAT CT AC C AGG AT T C AAT C AAC C 
G C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAGAAGAAAAC T TAT G 
AGAAACTTTTTGGAAAAGTAAAAGAG 

SEQ ID NO. 5309 
STRAIN CJB110 

AAAT ACT T T GAAAAAAG AAT T AGT T GAAG C T AAAAAGAC AAT T C CAT CC G 
T AAAAG C T T C AAAAG T AC C G C AAAAAT C AAC AT CAT C G AAAG AT AAAG AG 
TTTGTTCTTAAACCGATTATCGATGTCTCTGGTTGGCAACTTCCTAAGGA 
GATTGATTACGATACGCTTTCAAAAAATATTTCAGGTGTTGTTATTCGTG 
TCTTTGGT GGAT CAAAG AT AT CT AAG AC T AAT AACG C T G C T TAT AC AAC T 
GGAAT C GAT AAAT CG T T T AAG AC C CAT AT CAAAG AAT T T C AAAAG C G AAA 
TAT C C C AGT AG C T GT CT AC AGT TAT G C AC T T GG T T C AAGTG T T AAAG AAA 
T G AAAG AAGAG G C T C AG AT AT T T TAT AAGAAT G C AG CT C CT T AC AAAC C A 
AC T T T T TAT T GGAT T G AC GT AGAAG AGG AG AC AAT G T C T AAC AT GAAT AA 
AGGT GT C C AAG CAT T C C G AAAAG AAT T AAAAAGAC T T G G T G C T AAAAAT G 
T T GGT AT C T AC AT T GGT AC T T AC T T TAT G ACT GAG C AAGG CAT C T C T GT A 
AAAGG AT T T G AC GCTGTTTG GAT T C C AAC T TAT GGT AG CG AT T CT GG AT A 
CT AT G AAGCGG C T C C G C AAAC T GAACT T AAAT AC GAT T T AC AC C AAT AC A 
C CT C T C AAG G T T AT CT AC C AG GAT T C AAT C AACCG C T T GAT T T AAAT C AA 
AT T AC AG T T AAT AAAG AC AAG AAG AAAAC T T AT GAG AAAC T T T T T GG AAA 
AGTAAAAGAG 

SEQ ID NO. 5310 
STRAIN 1169NT 

AC AAAT AC T T T GAAAAAAG AAT T AGT T GAAG CT AAAAAGAC AAT T CC 
AT C C GT AAAAG C T T C AAAAGT AC CG C AAAAAT C AAC AT CAT C G AAAG AT A 

AAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTTGGCAACTTCCT 
AAG GAG AT T GAT T AC GAT AC G C T T T C AAAAAAT AT T T C AG GT G T T G T TAT 
TCGTGTCTTT G GT GG AT CAAAG AT AT C T AAG AC T AAT AACG C T G C T T AT A 
C AAC T G GAAT C GAT AAAT C GT T T AAG AC C CAT AT CAAAG AAT T T C AAAAG 
C G AAAT AT C C C AGT AG C T GT CT AC AG T TAT G C AC T T G G T T C AAG T GT T AA 
AG AAAT G AAAG AAG AGG C T C AG AT AT T T TAT AAG AAT G C AG CTCCTTACA 
AAC C AAC T T T T T AT T G GAT T G AC GT AG AAG AGGAG AC AAT GT C T AAC AT G 
AATAAAGGTGTCCAAGCATTCCGAAAAGAATTAAAAAGACTTGGCGCTAA 
AAATGTTGGTATCTACATCGGTACTTACTTTATGACTGAGCAAGGTATCT 

CTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGGTAGCGATTCT 
G GAT AC TAT GAAG C AG C T C C G C AAAC T GAACT T AAAT AC GAT T T AC AC C A 
AT AC AC C T C T C AAGG T TAT C T AC C AG GAT T C AAT C AAC C G CT T GAT T T AA 
AT C AAAT T G C AG T T AAT AAAG AC AAG AAG AAAAC T TAT GAG AAAC T T T T T 
GGAAAAGTAAAAGAG 
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SEQ ID NO. 5311 
STRAIN JM9130013 

ACAAATACTT TGAAAAAAGAATT AG 

T T G AAG C T AAAAAG AC AAT T C CAT C CGT AAAAG C T T C AAAAG T AC C G C AA 
AAAT C AACAT CAT C GAAAGAT AAAGAGT T T GT T CT T AAAC C GAT TAT C GA 
TGTCTCTGGTTGGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAA 
AAAATATTTCAGGTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCT 
AAGACTAAT AAC G C T G C T T AT AC AACT GGAAT CGAT AAAT C GT T T AAG AC 

CCATATCAAAGAATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTT 
AT G C AC T T G GT T C AAGT GT T AAAG AAAT GAAAGAAGAG G C T C AG AT AT T T 
TAT AAG AAT GC AG C T C C T T AC AAAC C AACT T T T TAT T G GAT T G ACGT AG A 
AG AG G AGAC AAT GT C T AAC AT GAAT AAAGGT G T C C AAG CAT T C C G AAAAG 
AAT T AAAAAGAC T T G G T G CT AAAAAT G T T GGT AT C T AC AT T G GT AC T T AC 
TTTATGACTGAGCAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGAT 
TCCAACTTATGGTAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTG 
AAC T T AAAT AC GAT T T AC AC CAAT AC AC C T CT C AAG G T T AT C T AC C AG G A 
T T C AAT C AAC CG CT T GAT T T AAAT C AAAT T G C AGT T AAT AAAGAC AAG AA 
G AAAAC T TAT G AGAAAC T T T T T GGAAAAG T AAAAGAG 

SEQ ID NO. 5312 

STRAIN 2 603 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
I S G W I R V FG G S K I S KTNNAA YT T G I D K S FKT H I KE FQKRN I P VAV Y S YAL G S S VKEMKE 

EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGXNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5313 

STRAIN 090 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 

I SGWIRVFGGSKISKTNNAAYTTGIDKSFKTHIKEFQKRNIPVAVYSYALGSS VKEMKE 

EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GI SVKGFDAVWI PT YGS DSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQI AVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5314 

STRAIN A90 9 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
I SGWIRVFGGSKISKTNNAAYTTGIDKSFKTHIKEFQKRNIPVAVYSYALGSS VKEMKE 
E AQI FYKNAAPYKPT FYWI DVEEETMSNMNKGVQAFRKE LKRLGAKNVG I Y I GT YFMTEQ 

GI SVKGFDAVWI PTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YE KL FGKVKE 

SEQ ID NO. 5315 

STRAIN H3 6B frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKT HIKE FQKRN I PVAVYS YALGS SVKEMKE 

EAQI FYKNAAPYKPT FYWI DVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQI AVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5316 

STRAIN 18RS21 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
I S G V V I R V FGG S K I S KT NN AA YT T G I DK S FKTH I KE FQKRN I P VAV Y S YALG S S VKE MKE 

EAQI FYKNAAPYKPT FYWI DVEEETMSNMNKGVQAFRKE LKRLGAKNVG I YIGT YFMTEQ 

GI SVKGFDAVWI PTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5317 

STRAIN M7 32 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
ISGWIRIFGGSKISKTNNAAYTTGIDKSFKTHIKE FQKRNI PVAVYS YALGS SVKEMKE 
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EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5318 

STRAIN COH1 frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPI IDVSGWQLPKE I DYDTLSKN 
I S G W I R I FG G S K I S KTNN AA YT T G I DK S FKTH I KE FQKRN I P VAV Y S YALG S S VKEMKE 
E AQ I F YKN AAP YKP T F Y W I D VE E E T M S NMNKG VQ A FRKE LKRL G AKN VG I Y I GT Y FMTE Q 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5319 

STRAIN M781 frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPI I DVSGWQLPKEIDYDTLSKN 
I SG WIRI FGGS KI SKTNNAAYTTGI DKS FKTH IKE FQKRN I PVAVYS YALG S S VKEMKE 

EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5320 

STRAIN CJB110 frame: 2 

NTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPI IDVSGWQLPKE I DYDTLSKNI 

SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKEFQKRNI PVAVYS YALGSSVKEMKEE 

AQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQG 

I S VKGFDAVWI PTYGS DSGYYE AAPQTELKYDLHQYTSQGYLPGFNQPLDLNQITVNKDK 
KKT YEKL FGKVKE 

SEQ ID NO. 5321 

STRAIN 1169NT frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPI IDVSGWQLPKEIDYDTLSKN 

I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKEFQKRNI PVAVYS YALGSSVKEMKE 

EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWI PTYGS DSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5322 

STRAIN JM9130013 frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPIIDVSGWQLPKE I DYDTLSKN 
I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKE FQKRNI PVAVYS YALGS S VKEMKE 
EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWI PTYGS DSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5401 
STRAIN 2603 

TTGACTCACAAAAAT AT ATT AT T AACCATT ATAT TT GGATT ATT T 

AT GAT TAT AT TAT C AGC AT GT G GT AT GT C T AAT AAG G AAAT GG C T GGT AT T G AT AAT T GG 
G AAC AT TAT C AAAAG G AAAAG AAAAT TACT AT T GGAT T T G AT AAT AC TTTTGTTC CT AT G 
GGATTTGAAAGTCGTTCTGGTGACTATACCGGCTTTGATATTGATTTAGCTAATGCTGTT 
T T T AAAG AAT AC G G TAT T T C AG T G AAAT GG C AG C C T AT T AACT G GG AT AT G AAAG AAACT 
G AACT T AAT AAT G GT AAT AT AG AC C T TAT T T G G AAT GG T T AT T C AAAAAC GG C AG AAC GT 
GCTAAAAAAGTCGCTTTTACAAACCCATATATGAATAATCATCAAGTAATTGTTACTAAA 
AC T T CAT C AC AT AT T AAT AG T AT T AAG GAT AT G AAG G GG AAAAAAC TAG G AGC C C AGT C G 

GGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATATTTTAAAAAAGTTTGTAAAA 
G G AAAAG AAG C AG T T C AAT AC GAT AC T T T C AC T C AGG C T T T GAT T GAT T T AAAAAAT AAC 
C GT AT T GAT GGT CT T T T GAT T GAT G AAGT T TAT G C T AACT AT TAT T T AAAG C AAG AAG G A 

AATATAAAAGCTTATTATTTTGTTAAAACTGCTTATCAAGGAGAAAATTTTGTAGTAGGA 
G CT C GT AAAGT T GAT C G TAG AC T AAT T G AAAAG ATT AAC AAAG C T T T C AAAC AG CT T CAT 
AAT AAGG GG AG AT T T C AAAAAAT C T C T T AC AAAT G G T T T G GT G AAGAT G T T TAT AG T AAA 
GAA 

SEQ ID NO. 5402 

STRAIN 090 
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ATTGGGaACATTATC 

AAAAGGAAAAGAAAATTACTATTGGATTTGATAATACTTTTGTTCCTATG 
G GAT T T G AAAG C C GT T CT G GT G ACT At AC C G G CTT T GAT AT T GAT T TAG C 

TAATGCTGTTTTTAAAGAATACGGTATTTCAGTGAAATGGCAGCCTATTA 
AC T GGG AT AT G AAAG AAAC T GAACT T AAT AAT GGT AAT AT AG AC CT TAT T 

TGGAATGGTTATTCAAAAACGGCAGAACGTGCTAAAAAAGTCGCTTTTAC 
AAAC C CAT AT AT G AAT AAT CAT C AAG T AAT T GT T ACT AAAACT T CAT C AC 

ATATTAATAGTATTAAGGATATGAAGGGGAAAAAACTAGGAGCCCAGTCG 
GGTTCATCTGGTTTTGATGCTTTTAATGCTAAACCTGATATTTTAAAAAA 
GT T T GT AAAAG G AAAAG AAG C AG T T C AAT AC G AT AC T TT C ACT C AGG C T T 
TGATTGATTTAAAAAATAACCGTATTGATGGTCTTTTGATTGATGAAGTT 
TATGCTAACTATTATTTAAAGCAAGAAGGAAATATAAAAGCTTATTATTT 
TGTTAAAACTGCTTATCAAGGAGAAAATTTTGTAGTAGGAGCTCGCAAAG 
T T GAT C GT AGAC T AAT T G AAAAG AT TAAC AAAG CT T T C AAAC AGC T T CAT 
AAT AAG GG AAAAT T T C AAAAAAT C T C T T AC AAAT GGTTTGGT G AAG AT G T 
T TAT AG T AAAGAA 

SEQ ID NO. 5403 

STRAIN A909 
ATTGGG 

a AC AT TAT CAAAAGGAAAAG AAAAT T AC TAT T GG AT T T GAT AAT ACT T T T 
GTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCTTTGATAT 
T GAT T T AG C T AAT GCTGTTTT T AAAG AAT AC G GT AT T T C AGT G AAAT G GC 
AG C CT AT T AACT GGG AT At gAAAG AAAC T G AACT T AAT AAT G GT AAT AT A 
G AC CT T AT T T G G AAT G GT T AT T CAAAAACG GCAGAACGT GC T AAAAAAGT 
CGCTTTTACAAACCCATATATGAATAATCATCAAGTAATTGTTACTAAAA 
CTT CAT C AC AT AT T AAT AG T AT T AAG GAT AT G AAGGGG AAAAAAC T AG GA 

GCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATAT 
T T T AAAAAAGT T T GT AAAAGG AAAAGAAGC AG t T C AAT AC GAT ACT T T C A 
CTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCTTTTGATT 
GAT GAAGT T T AT G C TAAC TAT TAT T T AAAGC AAGAAG G AAAT AT AAAAG C 

TT AT T ATTTT GTT AAAACT GCTT AT CAAGG AG AAAAT TTTGT AGT AGG AG 
C T CG T AAAGT T GAT C GT AGAC T AAT T GAAAAG AT TAAC AAAG C T T T C AAA 
C AG CTT CAT AAT AAG GGG AGAT T T C AAAAAAT C T CTT AC AAAT G GT T T GG 
T G AAG AT GTT TAT AG T AAAG a A 

SEQ ID NO. 5404 

STRAIN H3 6B 

ATTGGGAACATTATCAAAAGGAAAAGAAAATTACTATTGGATT 
TGATAATACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATA 
CCGGCTTTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATT 
T C AG T GAAAT G G C AG C C T AT TAAC T G GGAT AT GAAAG AAACT GAAC T T AA 

TAATGGTAATATAGACCTTATTTGGAATGGTTATTCAAAAACGGCAGAAC 
G T GCT AAAAAAGT CG CTT T T AC AAAC C CAT AT AT G AAT AAT CAT C AAG T A 
AT T GT T AC T AAAACT T CAT C AC AT AT T AAT AGT AT T AAGGAT AT G AAG G G 
GAAAAAACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACG 
CTAAACCT GAT AT TTT AAAAAAGT T T GTAAAAG GAAAAG AAGC AG tT CAA 
TACGATACTTTC AC TCAGGCTTTGATTGATTT AAAAAAT AACCGTATTGA 
TGGT CTT TTG AT TG AT GAAGT t TAT GCT AACT ATT AT TT AAAGC AAGAAG 
GAAAT AT AAAAGCTT ATT ATTTT GTT AAAACTGCTT AT CAAGGAgAAAAT 
TTTGT AGT AGGAGCTCGTAAAGTT GAT CGTAGACTAATTGAAAAGATTAA 
C AAAG CT T T C AAAC AG CTT C AT AAT AAGG G GAG AT T T C AAAAAAT C T C T T 
AC AAAT GGTTTGGT GAAG AT GT T TAT AGT AAAG AA 

SEQ ID NO. 5405 

STRAIN 18RS21 
ATTGGGAACATTA 

TCAAAAGGAAAAGAAAATTACTATTGGATTTGATAATACTTTTGTTCCTA 
TGGGATTTGAAAGTCGTTCTGGTGACTAtACCGGCTTTGATATTGATTTA 
G C T AATG C T GT T T T T AAAGAAT AC GGT AT T T C AGT G AAAT GG C AG C CT AT 
T AAC T GG G AT AT GAAAG AAAC T GAACT T AAT AAT GGT AAT AT AGAC CT T A 
TTTGGAATGGTTATTCAAAAACGGCAGAACGTGCTAAAAAAGTCGCTTTT 
AC AAAC C CAT AT AT G AAT AAT CAT C AAGT AAT T G T T AC T AAAAC T T CAT C 
AC AT AT T AAT AGT AT T AAG GAT AT GAAG G G G AAAAAAC T AGG AG C C C AGT 
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CGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATATTTTAAAA 
AAGT T T GT AAAAGGAAAAG AAG C AGT T C AAT AC GAT AC T T T C ACT C AGG C 

TTTGATTGATTTAAAAAATAACCGTATTGATGGTCTTTTGATTGATGAAG 
T T TAT G C T AACT AT TAT T T AAAGC AAG AAGGAAAT AT AAAAG CT TAT TAT 

TTTGTTAAAACTGCTTATCAAGGAGAAAATTTTGTAGTAGGAGCTCGTAA 
AGT T GAT C GT AGAC T AAT T G AAAAG AT T AAC AAAG CT T T C AAAC AG C T T C 
AT AAT AAGG G GAG AT T T C AAAAAAT CT C T T AC AAAT G G T TT GGT G AAGAT 
GTTTATAGTAAAGAA 

SEQ ID NO. 5406 

STRAIN M732 

AT T GGGAAC AT T AT C AAAAGG AAAAGAAAAT T AC TAT T GG AT T T G AT AA 
TACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCT 
TTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGTG 
AAAT GG C AG C C T AT T AAC T G G GAT AT GAAAG AAAC T GAACT T AAT AAT GG 

TAATATAGACCTTATTTGGAATGGTTATTCAAAAACGGCAGAACGTGCTA 
AAAAAGT C G CT T TT AC AAAC C CAT AT AT G AAT AAT CAT C AAGT AAT T GT T 

ACTAAAACTTCATCACATATTAATAGTATTAAGGATATGAAGGGGAAAAA 
ACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAAC 
C T GAT AT T T T AAAAAAGT T T GT AAAAG G AAAAGAAGC AGT T C AAT AC GAT 
AC T T T C ACT C AG G CT T T GAT T GAT T T AAAAAAT AAC C GT AT T GAT GGT C T 
T T T GAT T GAT GAAGT T TAT G CT AAC TAT TAT T T AAAG C AAG AAGG AAAT A 
T AAAAG CT TAT TAT T T T G T T AAAACT G C T T AT C AAGG AGAAAAT T T T GT A 
GT AGGAG C T C GT AAAG T T GAT C G TAG AC T AAT T G AAAAG AT T AAC AAAG C 
TTTCAAACAGCTTCATAATAAGGGGAGATTTCAAAAAATCTCTTACAAAT 
GGTTTGGTGAAGAT GTTTATAGTAAAGAA 

SEQ ID NO. 5407 

STRAIN COH1 

AT T G G GAAC AT T AT C AAAAG GAAAAG AAAAT T AC T AT T G GAT TT G AT AA 
TACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCT 
TTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGTG 
AAATGGCAGCCTATTAACTGGGATATGAAAGAAACTGAACTTAATAATGG 
T AAT AT AG AC C T TAT T T GG AAT GGT TAT T C AAAAAC G G C AGAAC G T G C T A 
AAAAAGT CGCT T TTACAAACCCAT AT ATGAAT AAT CAT CAAGTAATTGTT 
AC T AAAACT T CAT C AC AT AT T AAT AGT AT T AAG GAT AT G AAG GG G AAAAA 
ACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAAC 
CT GAT AT T T T AAAAAAGT T T G T AAAAG GAAAAG AAGC AGT T C AAT AC GAT 
ACTTTCACTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCT 
TTTGATTGATGAAGTTTATGCTAACTATTATTTAAAGCAAGAAGGAAATA 
T AAAAG CT TAT TAT T T T GT T AAAACT G CT TAT C AAGG AGAAAAT T TT GT A 
G T AG GAG C T C GT AAAG T T GAT C G TAG ACT AAT T GAAAAG AT T AAC AAAG C 
T T T C AAAC AG CT T CAT AAT AAG G GGAGAT T T C AAAAAAT C T CT T AC AAAT 
GGT TTGGTGAAGAT GTTTATAGTAAAGAA 

SEQ ID NO. 5408 

STRAIN M7 81 

AT T GGGAAC AT TAT C AAAAG GAAAAG AAAAT TACT AT T G GAT T T GAT A 
ATACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGC 
TTTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGT 
G AAAT G G C AG C C T AT T AACT G GG AT AT GAAAGAAAC T GAAC T T AAT AAT G 
GT AAT AT AG AC C T TAT T T G GAAT GGT TAT T C AAAAAC GG C AG AACGT GCT 
AAAAAAGT C G CT T T T AC AAAC C CAT AT AT G AAT AAT CAT C AAG T AAT T GT 
T AC T AAAAC T T CAT C AC AT AT T AAT AG TAT T AAG GAT AT G AAG G GG AAAA 
AACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAA 
C C T GAT AT T T T AAAAAAGT T T GT AAAAG G AAAAGAAG C AGT T C AAT ACG A 
T AC TTTC ACT CAGGCTTTG ATT GAT TT AAAAAAT AACCGT ATT GATGGTC 
T T T T GAT T GAT GAAGT T TAT GC T AAC T AT TAT T T AAAG C AAG AAGG AAAT 
AT AAAAGCT TAT T AT TTTGTT AAAACT GCTTATCAAGGAGAAAATTTTGT 
AG TAG GAG C T C GT AAAGT T GAT C G TAG ACT AAT T GAAAAG AT T AAC AAAG 
C T T T C AAAC AG CT T CAT AAT AAG G G G AGAT T T C AAAAAAT C T C T T AC AAA 
TGGTTTGGTGAAGATGTTTATAGTAAAGaA 

SEQ ID NO. 5409 
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STRAIN CJB110 

ATTGGGAACATTATCAAAAGGAAAAGAAAATTACTATTGGATTTGATAAT 
ACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCTT 
TGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGTGA 
AAT G G GAG C C TAT T AACT GG GAT AT G AAAGAAAC T G AAC T T AAT AAT G GT 
AAT AT AGAC C T TAT T T G G AAT GGT T ATT C AAAAAC GG C AG AAC G T G C T AA 
AAAAGT CG CT T T T AC AAAC C CAT AT AT GAAT AAT CAT CAAG T AAT T GT T A 
C T AAAACT T CAT C AC AT AT T AAT AGT AT T AAG GAT AT G AAGGGG AAAAAA 

CTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAACC 
T GAT AT T T T AAAAAAG T T T GT AAAAG GAAAAG AAGCAGT T C AAT AC GAT A 

CTTTCACTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCTT 
T T GAT T GAT GAAG T T TAT GC T AAC TAT TAT T T AAAG C AAG AAGG AAAT AT 
AAAAGC T TAT TAT T T T GT T AAAACT G C T TAT C AAGGAGAAAAT T T T GT AG 
T AGGAG C T C G T AAAG T T GAT CGT AGACT AAT T G AAAAGAT T AAC AAAG C T 
TT C AAAC AG C T T CAT AAT AAG G GG AGAT T T C AAAAAAT C T C T T AC AAAT G 
GTTTGGTGAAGATGTTTATAGTAAAGAA 

SEQ ID NO. 5410 

STRAIN 1169NT 

AT T GGG AAC AT TAT C AAAAGGAAAAGAAAAT T AC TAT T GG AT T T G AT AA 

TACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCT 
T T GAT AT T GAT T TAG C T AAT GCTGTTTT T AAAGAAT AC GGT AT T T C AG T G 
AAAT G G C AG C C T AT T AAC T G G GAT AT G AAAGAAAC T G AAC T C AAT AAT G G 
T AAT AT AG AC CT T AT T T G GAAT GGT TAT T C AAAAAC GG C AGAAC GT G C T A 
AAAAAGT CGCT TTT ACAAACCC AT AT ATGAAT AAT CAT CAAGT AATT GTT 
AC T AAAAC T T C AT C AC AT AT T AAT AGT AT T AAG GAT AT GAAGG G GAAAAA 

ACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAATGCTAAAC 
C T G AC AT T T T AAAAAAGT T T G T AAAAGG AAAAGAAG C AGT T C AAT AC GAT 

ACTTTCACTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCT 
TTT GAT T GAT GAAG T T T AT GCT AAC T AT TAT T T AAAG CAAG AAG GAAAT A 

T AAAAG CTT ATT ATT TTGTT AAAACT GCTTATCAAGGAGAAAATTTTGT A 
GTAGGAGCTCGCAAAGTTGATCGTAGACTAATTGAAAAGATTAACAAAGC 
TTT C AAAC AG CTT CAT AAT AAGGGG AAAT T T C AAAAAAT C T C T T AC AAAT 
G GT T T GGT GAAG AT GTT T AT AGT AAAGAA 

SEQ ID NO. 5411 

STRAIN JM9130013 
ATT GGG AAC AT TAT C 

AAAAGGAAAAGAAAAT T AC TAT T GGAT T T GAT AAT AC TTTTGTTC C T AT G 
GG AT T T GAAAGT CGTTCTGGT G AC T A t AC C GG C T T T GAT AT T GAT T T AG C 
T AAT GC T G T T T T T AAAGAAT ACG G TAT T T C AG T GAAAT G G C AG C CT AT T A 
ACT GGG AT AT G AAAG AAACT G AAC T T AAT AAT G GT AAT AT AG AC C T T AT T 

TGGAATGGTTATTCAAAAACGGCAGAACGTGCTAAAAAAGTCGCTTTTAC 
AAAC C CAT AT AT GAAT AAT CAT CAAGT AAT T GT TACT AAAAC T T CAT C AC 
AT AT T AAT AG TAT T AAGGAT AT G AAGG GGAAAAAAC T AGGAG C C C AG T C G 
GGTT C AT CT GGT TTT GAT GCT TTTAACGCT AAAC CT GAT ATTTT AAAAAA 
GTTTGTAAAAGGAAAAGAAGCAGTTCAATACGATACTTTCACTCAGGCTT 
T GAT T GAT T T AAAAAAT AAC C GT AT T GAT GGTCTTTT GAT T GAT G AAGT T 
TAT GCT AACTATT AT T T AAAGCAAGAAGGAAAT ATAAAAG CTT ATT AT T T 

TGTTAAAACTGCTTATCAAGGAGAAAATTTTGTAGTAGGAGCTCGTAAAG 
T T GAT CGT AG ACT AAT T GAAAAG AT T AAC AAAG C T T T C AAAC AG CTT CAT 

AATAAGGGGAGATTTCAAAAAATCTCTTACAAATGGTTTGGTGAAGATGT 
TTAT AGT AAAGAA 

SEQ ID NO. 5412 

STRAIN 2 603 frame: 1 

LTHKNILLTIIFGLFMIILSACGMSNKEMAGIDNWEHYQKEKKITIGFDNTFVPMGFESR 
SGDYTGFDIDLANAVFKEYGISVKWQPINWDMKETELNNGNIDLIWNGYSKTAERAKKVA 
FTNPYMNNHQVIVTKTSSHINSIKDMKGKKLGAQSGSSGFDAFNAKPDILKKFVKGKEAV 
QYDT FTQAL I DLKNNRI DGLL I DEVYAN YYLKQEGN IKAY Y FVKT A YQGEN FVVGARKVD 
RRL I EK I NKAFKQLHNKGRFQK I S YKWFGE DV Y S KE 

SEQ ID NO. 5413 

STRAIN 0 90 frame: 3 
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WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGN I DL I WNGYS KT AERAKKVAFTN P YMNNHQVI VTKT S SHIN S IKDMKGKKLGAQ 
SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGKFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5414 

STRAIN A909 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 

TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 

SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5415 

STRAIN H3 6B frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGS SGFDAFNAKPDILKKFVKGKEAVQYDT FTQALI DLKNNRI DGLLI DEVYAN YYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5416 

STRAIN 18RS21 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGS SGFDAFNAKPDILKKFVKGKEAVQYDT FTQALI DLKNNRI DGLLI DEVYAN YYLKQE 
GN I KA Y Y FVKT AYQGE N FVVGARKVDRRL I E K INKAFKQLHNKGR FQK I S YKW FGE D V YS 
KE 

SEQ ID NO. 5417 

STRAIN M7 32 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGF.DIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALI DLKNNRI DGLLI DEVYANYYLKQE 
GN I KAY Y FVKT AYQGEN FVVGARKVDRRL I E KINKAFKQ LHNKGRFQKI S YKW FGE D V YS 
KE 

SEQ ID NO. 5418 

STRAIN COH1 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 

TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 

SGSSGFDAFNAKPD I LKKFVKGKEAVQYDTFTQAL I DLKNNRI DGLLI DEVYANYYLKQE 

GNIKAYYFVKTAYQGENFVVGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5419 

STRAIN M781 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 

TELNNGN I DLIWNGYSKTAERAKKVAFTNP YMNNHQVI VTKT S SHINS IKDMKGKKLGAQ 

SGS SGFDAFNAKPDILKKFVKGKEAVQYDT FTQALI DLKNNRI DGLLI DEVYAN YYLKQE 

GN I KAY Y FVKT AYQGEN FVVGARKVDRRL I EK INKAFKQLHNKGR FQK IS YKW FGE DVYS 
KE 

SEQ ID NO. 5420 

STRAIN CJB110 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGS SGFDAFNAKPDILKKFVKGKEAVQYDT FTQALI DLKNNRI DGLLI DEVYAN YYLKQE 
GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5421 

STRAIN 1169NT frame: 3 



226 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
T E LNNGN I D L I WN G Y S KT AE RAKKVAFTN P YMNN HQ V I VT KT S S H I N S I K DMKGKK L G AQ 

SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGKFQKISYPCWFGEDVYS 
KE 

SEQ ID NO. 5422 

STRAIN OM9130013 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 

TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTSSHINSIKDMKGKKLGAQ 

SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5501 
STRAIN 2603 

ATGCTTAAATCTTTTTTGATTTTCTTAGTTCGCTTTTACCAAAAAAATATTTCTCCAGCT 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGAAGCTATTCAA 

AAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTATTTTGCGATGTCATCCCTTA 

GCCCACGGAGGAAATGATCCTGTCCCTGATCATTTTAGCTTAAGACGTAATAAAACGGAT 
ATATCAGAT 

SEQ ID NO. 5502 

STRAIN 090 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AG C T AT T C AAAAAC AT GGTC T AAAAG G TG T GT T GAT G G GG AT TGCACGTA 

TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CATTTTAGCTT 

SEQ ID NO. 5503 

STRAIN A90 9 

TTCCCAGCTAGCTGTCGTTATCGTCCAACtTGCTCTACGTATATGATAGA 

AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 

TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T Ag C T T AAG ACGT AAT AAAAC G GAT AT A 

SEQ ID NO. 5504 

STRAIN H36B 

TTCCCAGCTAGCTGTCGTTATCGTCCaACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTTCTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AGCT T AAGAC GT AAT AAAAC GGAT AT AT C AG AT 

SEQ ID NO. 5505 

STRAIN 18RS21 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 

AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 

TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AGC T T AAG ACG T AAT AAAAC G GAT AT AT C AGAT 

SEQ ID NO. 5506 

STRAIN M732 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AG C T AT T C AAAAAC AT GGT C T AAAAG G T GT GT T GAT G G G GAT T G C AC G T A 

TTTTGCGATGTCATCCCTTAgCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T TAG C T T AAG AC G T AAT AAAAC G GAT AT AT C AG AT 

SEQ ID NO. 5507 

STRAIN COH1 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGAAGCTATTCAA 
AAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTATTTTGCGATGTCATCCCTTA 
GCCCACGGAGGAAATGAtCCTGtCCCTGATCATTTTAGCT 

SEQ ID NO. 5508 
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STRAIN M7 81 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AG CT T AAG AC GT AAT AAAAC GGAT AT AT C AG AT 

SEQ ID NO. 5509 

STRAIN CJB110 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T TAG C T T AAG AC GT AAT AAAACGGAT AT AT C AG AT 

SEQ ID NO. 5510 

STRAIN 1169NT 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGGTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
TAT T T T AG C T TAAGAC GT AAT AAAAC GGAT AT AT C AG AT 

SEQ ID NO. 5511 

STRAIN JM9130013 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTTCTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CATTTTAGCTTAAGACGT AAT AAAACGGAT ATATCAGAT 

SEQ ID NO. 5512 

STRAIN 2 603 frame: 1 

MLKSFLIFLVRFYQKNISPAFPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPL 
AHGGNDPVPDHFSLRRNKTDISD 

SEQ ID NO. 5513 

STRAIN 0 90 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFS 

SEQ ID NO. 5514 

STRAIN A909 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
I 

SEQ ID NO. 5515 

STRAIN H36B frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5516 

STRAIN 18RS21 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5517 

STRAIN M732 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5518 

STRAIN COH1 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFS 

SEQ ID NO. 5519 

STRAIN M7 81 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 
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SEQ ID NO. 5520 
STRAIN CJB110 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5521 

STRAIN 1169NT frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGWMGIARILRCHPLAHGGNDPVPDYFSLRRNKTD 
ISD 

SEQ ID NO. 5522 

STRAIN JM9130013 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5601 
STRAIN 2603 

aagaagcttacttttatttgggatttagatgggacattaatagattcgta 
tgtaccaattatggaagctcttgaagaaacctatcgtcattttggtttaa 
tatttgataaagaattaatccatgaatatattttacaggaatcagtgggg 
aaattattggtaaacctttcagaggaagagcaaatacctcatgaaaaact 
gaaagcatattttacaaaagaacaagaaagtcgagattctaaaatacatt 
taatgccatatgcaaaagagattttagaatggaccaaagaacaagatatc 
cccaattttatgtatacacataaaggagcaagtacgcattcagtgttgga 
aaccttgcagatctctcattattttgatgaaattttaactggtgtttcgg 
gattcgagcgaaaaccacatccacaagggattaattatttagttaaacga 
tattctttagataaatcaatgacttattacataggagatcgtccactaga 
tttggaggttgctcaaaatgctggtataaaatccataaacttaaggttag 
agaattccaaagaaaactataatatttcaagtctcaaagatataatatca 
cttgatttcactcgtttggat 

SEQ ID NO. 5602 
STRAIN COH1 

AAG AAG CT T AC T T T T AT T TG GGAT T TAG AT GGG AC AT T AA 
TAG AT T C GT AT GT AC C AAT TAT GG AAG C T C T T G AAG AAAC C T AT C GT CAT 
T T T G G C TT AAT AT T T G AT AAAG AAT T AAT C CAT G AAT AT AT T T T AC AGG A 
ATCAGTGGGGCAATTATTGGTAAACCTTTCAGAGGAAGAGCAAATACCTC 
AT G AAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG AAAGT C GAGAT T C T 
AAAAT AC AT T T AAT GC CAT AT G C AAAAG AG AT T T T AGAAT GG AC C AAAG A 
AC AAG AT AT T C C C AAT T T T AT GT AT AC AC AT AAAGGAG C AAG T AC G CAT T 
CAGTGTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACT 
GGT GT T T C GGGAT T CGAG C G AAAAC C AC AT C C AC AAG GGAT T AAT TAT T T 
AGT T AAAC GAT AT T C T T T AGAT AAAT C AAT G AC T TAT TAG AT AG GAGAT C 
GT C C AC TAG AT T T G GAGGT T G C T C AAAAT G C T G G TAT AAAAT C CAT AAAC 
T T AAG G T TAG AG AAT T C C AAAG AAAAC TAT AAT AT T T C AAGT CT C AAAG A 
TAT AAT AT C ACT T GAT T T C AC T C GT T T G GAT 

SEQ ID NO. 5603 

STRAIN A90 9 

AAG AAG C T T ACT T T TAT T T GG G AT T T AGAT GGG AC AT T AAT 

AGAT T C GT AT GT AC C AAT T AT G G AAG C T C T T G AAG AAAC C TAT C G T CAT T T T GGT T T AAT 
ATTTGATAAAGAATTAATCCATGAATATATTTTACAGGAATCAGTGGGGAAATTATTGGT 
AAAC C T T T C AG AG GAAG AGC AAAT AC C T CAT G AAAAAC T GAAAG CAT AT T T TAG AAAAG A 
AC AAG AAAG T C GAGAT T CT AAAAT AC AT T T AAT G C CAT AT G C AAAAG AG AT T T TAG AAT G 
GACCAAAGAACAAGATATCCCCAATTTTATGTATACACATAAAGGAGCAAGTACGCATTC 
AGTGTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGG 
AT T C GAG C G AAAAC C AC AT C C AC AAGG G AT T AAT TAT T T AGT T AAAC GAT AT T C T T TAG A 
T AAAT C AAT GAC T T AT T A CAT AGG AG AT C GT C C AC TAG AT T T G GAG G T T G CT C AAAAT G C 
T GGT AT AAAAT C CAT AAAC T T AAG G T TAG AG AAT T C C AAAG AAAAC TAT AAT AT T T C AAG 
T CT C AAAG AT AT AAT AT C ACT T GAT TTCACTCGT 

SEQ ID NO. 5604 

STRAIN H3 6B 
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AAGAAGCTTACTTTTATTTGGGATTTAGATGGGACATTAATAGATTCG 
TATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGTTTAATATTTGAT 
AAAGAAT T AAT C CAT GAAT AT AT T T T AC AG GAAT C AGT GG G GAAAT TAT T GGT AAAC C T T 
T C AG AG G AAG AG C AAAT AC CT CAT GAAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG AA 
AG T C GAG AT T C T AAAAT AC AT T T AAT G C CAT AT G C AAAAG AG AT T T TAG AAT G G AC C AAA 
GAAC AAG AT AT C CC C AAT T T T AT GT AT AC AC AT AAAGG AG C AAGT AC GC AT T C AGT GT T G 
GAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTCGAG 
C GAAAAC C AC AT C C AC AAG GGAT T AAT TAT T T AGT T AAAC GAT AT T C T T TAG AT AAAT C A 
AT G AC T TAT T AC AT AGG AG AT C GT C C AC T AGAT T T GG AG GT T GC T C AAAAT G C T G GT AT A 
AAAT C CAT AAAC T T AAGGT TAG AG AAT T C C AAAG AAAAC TAT AAT AT T T C AAG T C T C AAA 
GAT AT AAT AT C ACT T GAT T T C ACT C G T T T G GAT 

SEQ ID NO. 5605 

STRAIN 18RS21 

AAG AAG C T T AC T T T TAT T T G G GAT TT AG AT G GG AC AT T AAT AG AT T 

C GT AT GT AC C AAT T AT GGAAG C T C T T G AAG AAAC C T AT C GT C AT T T T GG T T T AAT AT T T G 
AT AAAG AAT T AAT C CAT GAAT AT AT T T T AC AG GAAT C AGT GG GG AAAT TAT T G G T AAAC C 
T T T C AGAG G AAG AG C AAAT AC C T CAT GAAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG 
AAAGT C GAG AT T C T AAAAT AC AT T T AAT G C CAT AT G C AAAAG AG AT T T TAG AAT GG AC C A 
AAG AAC AAGAT AT C C C C AAT T T T AT GT AT AC ACAT AAAGG AG C AAGT AC GC AT T C AG T GT 
TGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTCG 
AGCGAAAACCACATCCACAAGGGATTAATTATTTAGTTAAACGATATTCTTTAGATAAAT 
C AAT G ACT TAT TAG AT AG G AGAT C GT C C ACT AG AT T T G GAGGT T G C T C AAAAT G C T GGT A 
T AAAAT C CAT AAAC T T AAGGT T AGAG AAT T C C AAAGAAAAC T AT AAT AT T T C AAG T CT C A 
AAGAT AT AAT AT C AC T T GAT T T C AC T C G T T T G GAT 

SEQ ID NO. 5606 

STRAIN M7 32 

AAG AAG C T T AC T T T TAT T T G G GAT T TAG AT GG G AC AT T AAT AG AT 

TCGTATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTTAATATTT 
GAT AAAGAAT T AAT C CAT GAAT AT AT T T T AC AG GAAT C AGT GG GG C AAT TAT T G GT AAAC 
CT T T C AG AG G AAG AGC AAAT AC CT CAT G AAAAACT GAAAG CAT AT T T T AC AAAAG AAC AA 

GAAAGTCGAGATTCTAAAATACATTTAATGCCATATGCAAAAGAGATTTTAGAATGGACC 
AAAG AAC AAG AT AT T C C C AAT T T T AT GT AT AC AC AT AAAG GAG C AAG T AC G CAT T C AG T G 

TTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTC 
GAG C GAAAAC C AC AT C C AC AAGGG AT T AAT T AT T TAG T T AAAC GAT AT T C T T TAG AT AAA 
T C AAT G ACT TAT T AC AT AGG AG AT C G T C C AC TAG AT T T G G AG GT T G C T C AAAAT G C T G G T 
AT AAAAT C CAT AAAC T T AAG GT TAG AG AAT T C C AAAG AAAACT AT AAT AT T T C AAG T CT C 
AAAGAT AT AAT AT C AC T T GAT TTCACTCGTTTG GAT 

SEQ ID NO. 5607 

STRAIN CJB110 

AAG AAG C T T AC T T T TAT T T G G GAT T TAG AT G G G AC AT T 

AATAGATTCGTATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTT 
AAT AT T T GAT AAAG AAT T AAT C CAT GAAT AT AT T T T AC AGG AAT C AG T G G GG C AAT TAT T 
GG T AAAC C T T T C AG AG G AAG AG C AAAT AC C T CAT G AAAAACT GAAAG CAT AT T T T AC AAA 
AG AAC AAG AAAGT C GAG AT T CT AAAAT AC AT T T AAT G C CAT AT G C AAAAG AG AT T T TAG A 
AT G G AC C AAAG AAC AAGAT AT C C CC AAT T T T AT G T AT AC AC AT AAAGG AGC AAGT AC G C A 

TTCAGTGTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTC 
T G GAT T CG AG C G AAAAC C AC AT C C AC AAGG GAT T AAT TAT T T AGT T AAAC GAT AT T C T T T 
AG AT AAAT C AAT GAC T T AT T AC AT AG G AGAT C GT C C C CT AG AT T T GG AGGT T G C T C AAAA 
T G CT G GT AT AAAAT C C AT AAACT T AAG G T TAG AG AAT T C C AAAG AAAAC T AT AAT AT T T C 
AAGT CT CAAGGAT AT AAT AT C AC T T GAT TT CACT CGT T 

SEQ ID NO. 5608 

STRAIN 1169NT 

a AG AAG CT T AC T T T TAT T T G G GAT T TAG AT G GG AC AT T AAT AG AT T CG TAT GT AC C AATT A 

TAGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTTAATATTTGATAAAGAATTAATCC 
AT GAAT AT AT T T TAG AG GAAT C AGT GGGG AAAT TAT T G GT AAAC C T T T C AG AG G AAG AG C 
AAAT ACCT CAT G AAAAACT GAAAGC AT ATTTT AC AAAAG AAC AAG AAAGT CG AG AT T CT A 
AAAT AC AT T T AAT G C CAT AC G C AAAAGAG AT T T TAG AAT G GAC C AAAG AAC AAG AT AT CC 
C C AAT T T TAT G TAT AC AC AT AAAG GAG C AAGT AC G CAT T C AGT G T T G G AAAC C T T G C AG A 
TCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTCGAGCGAAAACCACATC 
CACAAGGGATTAATTATTTAGTTAAACGATATTCTTTAGATAAATCAATGACTTATTACA 



230 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



T AGG AGAT C GT C C C C TAG AT T T GG AG G T T GC T C AAAAT GCT GGT AT AAAAT C C AT AAACT 
T AAGGT TAGAGAATT CCAAAGAAAACTATAAT ATT T CAAGTCT CAAGGAT AT AAT AT CAC 
TTGATTTCACTCGTTTGGAT 

SEQ ID NO. 5609 

STRAIN JM9130013 

AAGAAGCTT ACTTTT ATT TGGGATTT AGAT GGGACATTAATAGA 

TTCGTATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGTTTAATATT 
TGATAAAGAATTAATCCATGAATATATTTTACAGGAATCAGTGGGGAAATTATTGGTAAA 
C C T T T C AGAGG AAG AG C AAAT AC C T CAT G AAAAAC T G AAAG C AT AT T T T AC AAAAGAAC A 
AG AAAGT C GAG AT T C T AAAAT AC AT T T AAT G CC AT AT G C AAAAG AG AT T T T AGAAT GGAC 
CAAAGAACAAGATATCCCCAATTTTATGTATACACATAAAGGAGCAAGTACGCATTCAGT 
GTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATT 
CGAGCGAAAACCACATCCACAAGGGATTAATTATTTAGTTAAACGATATTCTTTAGATAA 
AT C AAT G AC T TAT T AC AT AG GAG AT C GT C CAC TAG AT T T G GAG GT T GCT C AAAAT G CT G G 
TAT AAAAT CC AT AAACT T AAGGT TAG AG AAT T CCAAAGAAAACTATAAT AT T T CAAGT CT 
C AAAGAT AT AAT AT C ACT TGAT TT C AC T CG T 

SEQ ID NO. 5610 

STRAIN 090 

AAGAAGCTTACTTTTATTTGG 

GATTTAGATGGGACATTAATAGATTCGTATGTACCAATTATGGAAGCTCT 
T GAAGAAACCT AT CGT CATT TT GGCTT AAT AT T T GATAAAGAATTAAT CC 
AT G AAT AT AT T T T AC AGG AAT C AG T GG G G C AAT TAT T G GT AAAC C T T T C A 
GAGGAAGAGCAAATACCTCATGAAAAACTGAAAGCATATTTTACAAAAGA 
AC AAG AAAGT C GAG AT T C T AAAAT AC AT T T AAT G C CAT AT GC AAAAG AG A 
TTTT AGAATGGACC AAAGAAC AAGAT AT CC C C AAT TTTATGT AT ACACAT 
AAAGGAGCAAGTACGCATTCAGTGTTGGAAACCTTGCAGATCTCTCATTA 
TTTTGATGAAATTTTAACTGGTGTTTCTGGATTCGAGCGAAAACCACATC 
CACAAGGGATT AATT AT T T AGTT AAACGAT AT T CT TT AGAT AAAT CAAT G 
ACTTATTACATAGGAGATCGTCCCCT AGAT TTGGAGGTT GCT CAAAATGC 
T GGT AT AAAAT C CAT AAACT T AAGGT TAGAGAATT C CAAAGAAAACT AT A 
AT ATTT CAAGT CT CAAGGAT AT AAT AT C ACT T GAT T T C ACT CGT 

SEQ ID NO. 5611 

STRAIN M781 

AAG AAGCT T AC T T T TAT T T G G GAT T TAG AT G G G AC AT T AAT AG AT T C GT 
ATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTTA 
AT AT T T GAT AAAG AAT T AAT C CAT G AAT AT AT T T T AC AG G AAT C AGT GGG 
G CAAT TAT T GGT AAAC C T T T C AGAG G AAGAG C AAAT AC C T CAT GAAAAAC 
TG AAAGC AT AT TTT AC AAAAGAAC AAG AAAGT CGAGATT yT AAAAT ACAT 
TT AAT GC CAT ATGCAAAAG AGAT T T T AGAATGGAC C AAAGAAC AAGAT AT 
T C C CAAT T T TAT GT AT AC AC AT AAAG GAG C AAG T AC G C AT T C AGT GT T GG 
AAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCG 
GG AT T C GAG C G AAAAC CAC AT C CAC AAG G GAT T AAT TAT T T AGT T AAAC G 
AT AT T CT T TAG AT AAAT CAAT G AC T TAT TAG AT AG G AGAT C GT C CAC TAG 
ATTTGGAGGTTGCTCAAAATGCTGGTATAAAATCCATAAACTTAAGGTTA 
GAG AAT T C C AAAG AAAAC TAT AAT AT T T C AAG T C T C AAAG AT AT AAT AT C 
ACTTGATTTCACTCGT 

SEQ ID NO. 5612 
STRAIN 2603 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5613 

STRAIN A90 9 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 
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SEQ ID NO. 5614 

STRAIN H3 6B frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5615 

STRAIN 18RS21 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5616 

STRAIN M732 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5617 

STRAIN COH1 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLEN SKEN YNI S S LKD IIS LDFTRL D 

SEQ ID NO. 5618 

STRAIN CJB110 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 

SEQ ID NO. 5619 

STRAIN 116 9NT frame: 1 

KKLTFIWDLDGTLIDSYVPIIEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5620 

STRAIN JM9130013 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 

SEQ ID NO. 5621 

STRAIN 090 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLEN S KEN YN ISSLKDIISLDFTR 

SEQ ID NO. 5622 

STRAIN M781 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDXKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 

SEQ ID NO: 5701 
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STRAIN 2603 

AT G C T T AT GAC AAAAAT AAT AGGACT GAC AGG AGGG AT AGC T T CT 

GG AAAGT C AACGGT AAC AAAAAT AAT AC G AGAAT C AGGTT T T AAAGT C AT AG AT G CG G AT 
C AAGT GGT T C AT AAAT T GC AAG C T AAGGGT G GG AAAC T T T AC C AAG CT T TAT T AGAAT GG 
T T G GGT C C CG AG AT ACT T G AT GCTGAT GGT GAGT T GG AT AG AC CAAAGC T T T C T C AAAT G 
AT T T T T G C T AAT C C AG AC AAT AT GAAG AC AT C AG C T AGGCT AC AAAAT AGT AT CAT T C GT 
C AAG AGT TAG CAT GT C AG C GCG AC C AAT T AAAAC AAAC AG AAG AGAT AT T T T T C AT GGAT 
AT T CCT T TAT T GAT T GAAG AAAAGT AT AT AAAAT GGT T T GAT GAG AT T T G GT T GGT AT T T 
GT T GAT AAAGAAAAAC AAT T AC AAC GAT T AAT G GC C C GT AAC AACT AC AGT C GAG AAG AA 
G C AG AAT T AC G AC T T T C AC AC C AAAT G C CT T T AAC AGAT AAAAAAAGT T T C G CT AGT C T T 

ATTATTGACAATAATGGTGATTTAATAACTTTAAAAGAGCAAATATTGGATGCTCTTCAA 
CGTTTA 

SEQ ID NO: 5702 

STRAIN 090 

AAGT C AAC G GT AAC AAAAAT AAT AC GAG AAT C AG 

G T T T T AAAGT CAT AGAT G CG G AT C AAGT G G T T CAT AAAT T G C AAG CT AAG 
GGT GGGAAAC T T T AC C AAGCT T TAT TAG AAT GGTTGGGTC C CGAG AT AC T 
TGAT GCTGAT GGTG AGT TGGATAGACCAAAGCTTTCTCAAATGATTTTTG 
C T AAT C C AGAC AAT AT GAAG AC AT C AGC T AGG C T AC AAAAT AG TAT CAT T 
C GT C AAGAGT T AG CAT GT C AG C G CG AC C AAT T AAAAC AAAC AG AAG AGAT 
AT T T T T CGT GG AT AT T C C TT TAT T GAT T GAAG AAAAGT AT AT AAAAT GGT 
T T GAT GAG AT T T GG T T GGT AT T T GT T G AT AAAG AAAAAC AAT T AC AAC G A 
T T AAT GG C C C G T AAC AAC T AC AG T C GAG AAG AAG C AGAAT T ACGAC T T T C 
AC AC C AAAT G C C T T T AAC AG AT AAAAAAAG T T T C G CT AGT CT T AT TAT T A 
AT AAT AAT GGT GAT T T AAT AAC T T T AAAAG AG C AAAT AT T GG AT G C T C T T 
CAACGTTTA 

SEQ ID NO: 5703 

STRAIN A909 

AAGT C AAC G GT AAC AAAAAT AAT AC GAG AAT C AG 

GTTTTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAG 
GGTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACT 
TGATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTG 
C T AAT C C AG AC AAT AT GAAGAC AT C AG C TAG G C T AC AAAAT AG TAT CAT T 
CGT C AAG AGT TAG CAT G T C AG CG C GAC C AAT T AAAAC AAAC AG AAG AGAT 
AT T T T T CAT GGAT AT T C C T T T AT T GAT T GAAG AAAAGT AT AT AAAAT GGT 
TTGATGAGATTT GGT TGGT ATT TGTT GAT AAAG AAAAAC AAT TACAACGA 
T T AAT G GC CC GT a ACAAC T AC AGT C GAG AAG AAG C AG AAT T ACG ACT T T C 
ACAC C AAAT G C C T T T AAC AGAT AAAAAAAGT T T C G C T AGT C T TAT TAT T G 
AC AAT AAT GGT GAT T T AAT AAC T T T AAAAG AGC AAAT AT T GGAT G C T C T T 
CAACGTTTA 

SEQ ID NO: 5704 

STRAIN H3 6B 

AAGTCAACGGTAACAAAAATAATACGAGAATCAGG 

TTTT AAAGT CAT AGAT GCGGATCAAGTGGTT CAT AAAT TGC AAGCT AAGG 
G T G GGAAAC T T T AC C AAG CT T TAT TAG AAT GGT T G G G T C C CGAG AT AC T T 

GAT GCTGAT GGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGC 
T AAT C C AG AC AAT AT GAAGAC AT C AG CT AG G C T AC AAAAT AG TAT CAT T C 
G T C AAGAGT TAG CAT G T C AG C G C GAC C AAT T AAAAC AAAC AG AAG AG AT A 
T T T T T CAT G GAT AT T C CT T T AT T G AT TG AAG AAAAGT AT AT AAAAT G GT T 
T GAT G AGAT T T G GT T G GT AT T T GT T GAT AAAG AAAAAC AAT T AC AAC GAT 
T AAT GGC C C G t AAC AAC T AC AG T C GAG AAG AAG C GG AAT T AC G ACT T T C A 
CACCAAATACCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGA 

TAATAATGGTGATTTAATAACTTTAAAAGAGCAAATGTTGGATGCTCTTC 
AACGTTTA 

SEQ ID NO: 5705 

STRAIN 18RS21 

AAGT C AAC GG T AAC AAAAAT AAT AC GAG AAT C AG G 

TTTTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAGG 
GTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTT 
GATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGC 
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T AAT C C AG AC AAT AT G AAG AC AT C AG C T AGG C T AC AAAAT AG T AT CAT T C 
GT CAAGAGT T AGC ATGTCAGCGCGAC C AAT T AAAACAAACAGAAGAGAT A 
TT T T T CAT GGAT ATT CCT TT AT TGATTGAAGAAAAGT AT ATAAAAT GGTT 
T GAT G AGAT T T G GT T G GT ATT T GT T GAT AAAGAAAAAC AAT T AC AAC GAT 
T AAT GGC C CGT AAC AAC T AC AGT CG AG AAGAAGC AG AAT T AC G ACT T T C A 
C AC C AAAT GCCT T T AAC AG AT AAAAAAAGT T T C G CT AGT CT T AT TAT T GA 
CAATAATGGT GAT TTAAT AACT TT AAAAGAGCAAATAT T GGATGCT CT T C 
AACGTTTA 

SEQ ID NO: 5706 
STRAIN M732 

AAGT CAACGGT AAC AAAAAT AAT ACG AGAAT CAGGTT 
TTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAGGGT 
GGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTTGA 
TGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGCTA 
ATCCAGACAATATGAAGACATCAGCTAGGCTACAAAATAGTATCATTCGT 
CAAGAGTTAGCAT GT CAGCGCGAC CAATT AAAACAAACAGAAGAGAT ATT 
T T T CAT GGAT AT T C C TT T AT T GAT T G AAG AAAAGT AT AT AAAAT G GT T T G 
AT GAGATT T GGT T GGT ATT T GT T GAT AAAGAAAAAC AAT T ACAACGATT A 
AT G G C C C GT AAC AAC T AC AG T C GAGAAG AAG C AGAAT T AC GAC T T T C AC A 
CCAAATGCCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGACA 
ATAATGGTGATTTAATAACTTTAAAAGAGCAAATATTGGATGCTCTTCAA 
CGTTTA 

SEQ ID NO: 5707 

STRAIN COH1 

AAGT CAACGGTAAC AAAAAT AAT ACG AGAAT CAGGT 

T T T AAAGT CAT AGAT G C G GAT C AAG T G G T T CAT AAAT T GC AAG C T AAGG G 

TGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTTG ' \ 

ATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGCT 

AAT CC AGACAAT AT GAAGACAT CAGCT AGGCT ACAAAAT AGT AT C AT T CG 

T C AAG AGT T AGC AT G T C AG C GC G AC C AAT T AAAAC AAAC AG AAG AG AT AT 

T T T T CAT GGAT AT T CC T T TAT T GAT T G AAG AAAAGT AT AT AAAAT GGT T T 

GAT GAG AT T T GGT T GG T AT T T G T T GAT AAAGAAAAAC AAT T AC AAC GAT T 

AAT G G C C C GT a AC AAC T AC AGT C GAG AAG AAG C AGAAT T AC GAC T T T C AC 

AC C AAAT GCCTTTAACAGAT AAAAAAAGT TTCGCTAGTCTT ATT ATT GAC 

AATAATGGTGATTTAATAACTTTAAAAGAGCAAATATTGGATGCTCTTCA 

ACGTTTA 

SEQ ID NO: 5708 

STRAIN M7 81 

AAGTCAAQGGTAACAAAAATAATACGAGAATCAGG 

T T T T AAAG T CAT AG AT GCG GAT C AAGT GGTT CAT AAAT T G C AAG C T AAG G 
GTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTT 
GATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGC 
T AAT C C AG AC AAT AT GAAGACAT C AG C TAG G C T AC AAAAT AGT AT CAT T C 
G T C AAG AG T TAG C AT GT C AG C G CG AC C AAT T AAAAC AAAC AG AAG AG AT A 
T T T TT CATGGAT AT T C CT T T AT TG AT T GAAGAAAAGT AT AT AAAAT GGTT 
T GAT GAG AT T T GGT T GG T AT T T GT T G AT AAAG AAAAAC AAT T AC AAC GAT 
T AAT G G C C C G T AAC AAC T AC AG T C G AG AAGAAG C AG AAT T AC GAC T T T C A 
C AC CAAAT GCCTTTAACAGAT AAAAAAAGT TTCGCTAGTCTT ATT AT TGA 
C AAT AAT G GT G AT T T AAT AAC T T T AAAAG AG C AAAT AT T G GAT G C T C T T C 
AACGTTTA 

SEQ ID NO: 5709 

STRAIN CJB110 

AAGT CAACGGT AAC AAAAAT AAT ACG AG AA 

TCAGGTTTTAAAGT CAT AGATGCGGATC AAGT GGTT CAT AAAT TGCAAGC 
TAAGGGTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGA 
TACTTGATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATT 
T T T G C T AAT C C AG AC AAT AT GAAGACAT C AG C T AG G CT AC AAAAT AG T AT 
CAT T C GT CAAGAGT TAG CAT G T C AG C G C GAC C AAT T AAAAC AAAC AG AAG 
AGAT AT T T T T C GT GG AT AT T C C T T TAT T GAT T GAAGAAAAGT AT AT AAAA 
T GG T T T GAT GAG AT T T GGT T G G T AT T T G T T GAT AAAG AAAAAC AAT T AC A 
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ACGATTAATGGCCCGTaACAACTACAGTCGAGAAGAAGCAGAATTACGAC 
T T T C AC AC C AAAT G C C T T T AAC AG AT AAAAAAAGT T T CG CT AGT CT T AT T 
AT T AAT AAT AAT GGT G AT T T AAT AAC T T T AAAAG AGC AAAT AT T G G ATG C 
TCTTCAACGTTTA 

SEQ ID NO: 5710 

STRAIN 1169NT 

AAGT CAACGGTAACAAAAAT AATACGAGAAT CAGG 

T TTTAAAGT CAT AGAT G CGGAT CAAGT GGTT CAT AAATT GCAAGCT AAGG 
GTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTT 
GAT GC T GAT GGT G AGT T G GAT AG AC C AAAG C T T T C T C AAAT GAT T T T T G C 
T AAT C C AGAC AAT AT GAAG AC AT C AG C T AG G C T AC AAAAT AGT AT CAT T C 
GTCAAGAGTTAGCATGTCAGCGCGACCAATTAAAACAAACAGAAGAGATA 
T TT T T CAT G GAT AT T C C T T TAT T GAT T G AAG AAAAGT AT AT AAAAT GGT T 
T GATGAGAT T TGGT T GGT ATTTGTTGAT AAAGAAAAACAAT T ACAACGAT 
TAATGGCCCGTAACAACTACAGTCGAGAAGAAGCAGAATTACGACTTTCA 
CACCAAATACCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGA 
T AAT AAT GGT GAT T T AAT AAC TT T AAAAG AG C AAAT GT T GG AT GCT C T T C 
AACGTTTA 

SEQ ID NO: 5711 

STRAIN JM9130013 

AAGT C AACGGTAACAAAAAT AAT ACGAGAATCAGGT 

T TT AAAGT CAT AG AT G C GG AT CAAGT GGTT CAT AAAT T G C AAG C T AAGG G 
TGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTTG 
ATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGCT 
AAT C C AG AC AAT AT GAAG AC AT C AGC TAG GC T AC AAAAT AGT AT CAT T C G 
T C AAG AGT TAG CAT G T C AG C G C G AC C AAT T AAAAC AAAC AGAAGAG AT AT 
T T T T CAT G GAT AT T C C T T T AT T GAT T GAAG AAAAGT AT AT AAAAT GGT T T 
GAT GAG AT T T GGT T G G T AT T T GT T GAT AAAG AAAAAC AAT T AC AAC GAT T 
AAT G G C C C G T AAC AAC T AC AGT C GAG AAG AAG C GGAAT T ACGAC T T T C AC 
ACCAAATACCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGAT 
AATAATGGTGATTTAATAACTTTAAAAGAGCAAATGTTGGATGCTCTTCA 
ACGTTTA ( 

SEQ ID NO: 5712 

STRAIN 2 603 frame: 1 

MLMTKIIGLTGGIASGKSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEI 
LDADGELDRPKLSQMIFANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLI 
EEKYIKWFDEIWLVFVDKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNN 
GDLITLKEQILDALQRL 

SEQ ID NO: 5713 

STRAIN 090 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFVDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIINNNGDLITLKEQILDALQR 
L 

SEQ ID NO: 5714 

STRAIN A909 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNNGDLITLKEQILDALQR 
L 

SEQ ID NO: 5715 

STRAIN H3 6B frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQIPLTDKKSFASLIIDNNGDLITLKEQMLDALQR 
L 

SEQ ID NO: 5716 
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STRAIN 18RS21 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNN Y SREE AELRLS HQMPLT DKKS FAS L 1 1 DNNGDL I TLKEQI LDALQR 
L 

SEQ ID NO: 5717 

STRAIN M732 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNNGDLITLKEQI LDALQR 
L 

SEQ ID NO; 5718 

STRAIN COH1 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKS FAS LI I DNNGDLI TLKEQI LDALQR 
L 

SEQ ID NO: 5719 

STRAIN M7 81 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNNGDLITLKEQILDALQR' 
L 

SEQ ID NO: 5720 

STRAIN CJB110 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFVDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIINNNGDL I TLKEQI LDALQR 
L 

SEQ ID NO: 5721 

STRAIN 1169NT frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQIPLTDKKSFASLIIDNNGDLITLKEQMLDALQR 
L 

SEQ ID NO: 5722 

STRAIN JM9130013 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQIPLTDKKSFASLIIDNNGDLITLKEQMLDALQR 
L 

SEQ ID NO. 5801 
STRAIN 2603 

ATGTTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTATGATTTTAGCCTTTTTATTG 
GTAAATAATAGTTATTTTAGACAGTTAATTGAAGAGCGGTCTAAACGTGAAACGGTAGTC 
CTTGTCATCATTTTCGGCTTGTTTGTTATTATATCTAATATAACAGGAATTGAAATAAAA 
GGGGATCGAAGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTT 
GCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCTCTGGTTGGA 
TCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAAGGAAGCTTTTCAGGTTCT 
TTCTATATTGTCAGTTCAGTTCTAGTCGGCATTGTTAGCGGAAAGATTGGTGATAAGCTT 
AAGGAAAACCATCTCTACCCTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAA 
AGTATCCAGATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTC 
ATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATTTTGAAAACT 
TAT T T GT C AAAT G AAAGT C AGT T AC G C G GAG T T C AAACG AG AG AT GT T CT T G AAT T G ACT 
CGACAGACTCTGCCCTACCTTAGACAAGGTTTGACACCGCAATCTGCTAGGAGCGTTTGC 
GAAATTATAAAGAGGCATACTAACTTTGATGCTGTGGGATTAACAGATCGGTCAAACGTA 
TTAGCTCATATTGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAGAC 



236 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



T TAT CT AAAAG T GT TAT T T T T GAT G G CG AAC C AAG AAT T G CGC AAGAT AAAG C G G C GAT T 
TCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGTAGTTCCTCTAAAAATAAAT 
G AT AAAAC TGTGGGTGCCT T AAAAAT GT AC T T T G C AGGAG AT AAG AC AAT GT C T G AG GT G 
GAGGAAAACCTAGTCCTTGGTTTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATA 
AC AG AGG AAC AAAAT AAGT T AGC CAGT AT G GC AGAGAT AAAGGCT T T AC AAG C AC AAAT C 
AAC C CT CAT TTCTTCTT T AAT G C C AT T AACAC AATT AGT GC AT T AAT C CG TAT T GAT T C T 
GATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTTAGAACAAGTTTGCAGGGT 
GGTCAGGATCGTGAGGTAACGCTTGAGCAAGAAAAATCACATGTGGATGCTTATATGAAT 
GTTGAAAAATTACGTTTCCCTGATAAATATCAGTTATCTTATGATATTAGTGCACCAGAA 
AAAATGAAGTTACCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCT 
T T C AAAGAAC G T AAGACG G AC AAC CAT AT AT T GG T T C AAAT AAAG C C AG AT G GT C AT TAT 
TAT TGTGTTTCTGT T AGT G AC AATG G AC AAGG AAT C T C AG AT AC TAT CAT T GAT AAAT T A 
GGTCAAGAAACAGTTGCAGAGAGTAAGGGTACAGGTACTGCTCTAGTTAATCTAAATAAC 
AGG CT G AAT T TAT TAT AT G GT AGT G T AAGT T GC C TT CAT T T T T C GAG C G AC AAG AAT G G T 

ACAAAAGTTTGGTATCGAATACCTAATAGAATAAGGGAGGATGAGCATGAAAATTTTAAT 
TCT 

SEQ ID NO. 5802 

STRAIN 0 90 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTAT 
GATTTTAGCCTTTTT ATT GGT AAAT AAT AGTTATTTCAGACAGTT AATT G 
AAGAGCGGTCTAAACGTGAAACGGTAGTACTTGTCATCATTTTCGGCTTG 
TTTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAG 
TTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTTG 
CT AAT AC AAG G ACT T T AGT TAT T AC AAC GG C AAGTT T GGT T GGT GG AC C T 
CTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCA 
AGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCA 
T T GT T AG CGG AAAG AT T G G T GAT AAG C T T AAG G AAAAC CAT CT C T AC C C T 
T C AAC AAG C C AAGT TAT T T T AAT T AGT AT T ATT G C C GAAAGT AT C C AGAT 
G CT AT T T G T T GG TAT T T T T AC AG GAT GG G AAC T T G T C AAAAT GAT T GT C A 
TTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATT 
T T G AAAACT T AT T T G T C AAAT GAAAGT C AGT TACGCG CAGT T C AAACG AG 
AGATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTCAGACAAGGTT 
T G AC AC C G C AAT C T G C T AGG AG C GT T T G C G AAAT TAT AAAG AG G CAT AC T 
AACT T T GAT G CT GT AGGAT TAAC AG AT CG GT C AAAC GT AT T AG CT CAT AT 
TGGTGTTGGC CAT GAT C AC CAT AT T GC AGGAC AAC CAGT C AAAAC AG AC C 
T AT CTAAAAGTGTT AT T T TTGAT GGCGAAC CAAGAAT TGCGCAAGAT AAA 
GCGG C GAT TTCTTGTC C AG AT C AC AAC T GT C AGT T AAAT T C T G C TAT T GT 
AGTT CCT CTAAAAATAAATG AT AAAACT GT G GGT G CCTT AAAAAT GT ACT 
T T G C AGG AG AT AAG AC AAT GT C T G AGGT GG AGG AAAAC CT AGT C C T T G G T 
TTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATAACAGAGGAACA 
AAAT AAGT TAG C C AG TAT GG C AGAGAT AAAGG C T T T AC AAG C AC AAAT C A 
ACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGT 
AT T GAT TCT GAT AAAG C AC GT T AT GC ACTG AT G C AGT T AAGT AC T T T T T T 
TAG AAC AAGT T T G C AAG GT GGT C AGG AT C GT G AGG TAAC G C T T GAG C AAG 
AAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCT 
GAT AAAT AT CAGTTAT CTTATGATATTAGTGCACCAGAAAAAATGAAGTT 
ACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTAGACATGCTT 
TCAAAGAACGTAAGACGGACAACCATATATTGGTTCAAATAAAGCCAGAT 
GGT CAT TAT TAT TGTGTTTCTGT T AGT G AC AAT G G AC AAGG AAT C T C AGA 
T ACT AT C ATT GAT AAAT T AGGT CAAGAAACAGTTGC AG AGAGT AAGGGT A 
CAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGT 
AGT G T AAGT T G C CT T CAT T T T T C GAG C G AC AAG AAT GGT AC AAAAGT T T G 
GTAT CGAATACCT AAT AGAATAAGGGAGG AT G AGC AT G AAAATT TT AAT T 
CT 

SEQ ID NO. 5803 

STRAIN A90? 

TTGATGGTGTTGTTATTCC AAAGGCT AGG AAT TAT TAT 

GATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAATTG 

AAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTG 

TTTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAG 

TTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTTG 

CT AAT ACAAGGACTTT AGTT ATT AC AACGGC AAGTT TGGTTGGTGGACCT 
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CTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCA 
AGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCA 
TTGTTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCT 
T C AAC AAGC C AAGT TAT T T T AAT T AGT AT TAT T G C C G AAAGT AT C C AGAT 
GCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCA 
TTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATT 
T T GAAAAC T T AT T T GT C AAAT GAAAG T C AGT T AC G C G C AGT T C AAAC GAG 
AGATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTTAGACAAGGTT 
T G AC AC C G C AAT C T G C TAG GAG C GT T T G C G AAAT T AT AAAG AG GC AT AC T 
AACT T T GAT G CT G T G GG AT T AAC AG AT C G GT C AAAC GT AT TAG C T CAT AT 
TGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAGACT 
T AT CT AAAAGTGT T ATTTT TGAT GGCGAACCAAGAAT T GCGCAAG AT AAA 
GCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGT 
AGT T C C T CT AAAAAT AAAT G AT AAAAC TGTGGGTGCCT T AAAAAT GT AC T 
T T GC AGG AG AT AAG AC AAT GT CT G AG G T G GAG GAAAAC C TAG TCCTTGGT 
TTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATAACAGAGGAACA 
AAAT AAGT TAG C C AG TAT G G C AG AG AT AAAGG CT T T AC AAG C AC AAAT C A 
ACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGT 
AT T GAT T C T GAT AAAG C ACG T T AT G C AC T GAT GC AG T T AAGT AC T TT T T T 
TAGAACAAGTTTGCAGGGTGGTCAGGATCGTGAGGTAACGCTTGAGCAAG 
AAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCT 
GAT AAAT AT C AG T T AT C T TAT GAT AT T AGT G C AC C AG AAAAAAT G AAGT T 
ACCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCTT 
T C AAAGAAC GT AAG AC GG AC AAC C AT AT AT T GGT T C AAAT AAAG C C AG AT 
GGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAGA 
T AC TAT CAT T GAT AAAT TAG G T C AAG AAAC AGT T G C AG AG AGT AAGGG T A 
CAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGT 
AGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTG 
GT AT C GAAT AC C T AAT AG AAT AAGGG AGG AT GAG CAT G AAAAT T T T AAT T 
CT 

SEQ ID NO. 5804 

STRAIN H36B 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTATG 

ATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAATTGA 

AGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTGT 

TTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAGT 

TTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTTGC 

TAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCTC 

TGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAA 

GGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCAT 

TGTTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCTT 

C AAC AAG C C AAGT TAT T T T AAT T AGT AT TAT T G C CG AAAGT AT C C AG AT G 

CTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCAT 

TCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATTT 

T G AAAACTT AT T T GT C AAAT G AAAGT C AG T T AC G C GC AGT T C AAAC GAGA 

GATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTTAGACAAGGTTT 

GACACCGCAATCTGCTAGGAGCGTTTGCGAAATTATAAAGAGGCATACTA 

AC T T T GAT G C T GT GG G AT T AAC AG AT C G GT C AAAC GT AT TAG C T CAT AT T 

GGT GT T GGC C AT GAT C AC CAT AT T G C AGG AC AAC C G GT C AAAAC AG AC T T 

ATCTAAAAGTGTTATTTTTGATGGCGAACCAAGAATTGCGCAAGATAAAG 

CGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGTA 

GTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTACTT 

T G C AG GAG AT AAG AC AAT G T C T G AGGT G GAG G AAAACC T AGT CCTTGGTT 

T AG C G C AAAT AT T T T C AG G AC AACT GG C AAT G G GG AT AAC AG AG G AAC AA 

AAT AAGT TAG C C AG T AT G G C AG AG AT AAAGG C T T T AC AAG C AC AAAT C AA 

CCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGTA 

TTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTT 

AGAACAAGTTTGCAGGGTGGTCAGGATCGTGAGGTAACGCTTGAGCAAGA 

AAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCTG 

AT AAAT AT C AGTT AT CT T ATGAT ATT AGT GC ACCAGAAAAAAT GAAGT T A 

CCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCTTT 

C AAAG AAC GT AAGAC G G AC AAC CAT AT AT T G GT T C AAAT AAAG C C AG AT G 

GT CAT TAT TAT TGTGTTTCTGT T AGT G AC AAT G G AC AAGG AAT C T C AG AT 
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AC TAT CAT T GAT AAAT T AGGT C AAGAAAC AGT T G C AGAG AGT AAGGG T AC 
AG GT AC T G CT CT AG T T AAT CT AAAT AAC AG G CT GAAT T T AT T AT AT GGT A 
GTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTGG 
TAT C GAAT AC CT AAT AG AAT AAG G GAG GAT GAG CAT GAAAAT T T T AAT T C 
T 

SEQ ID NO. 5805 
STRAIN 18RS21 

T T GAT GGTGTTGT TAT T C C AAAG G CT AGG AAT TAT TAT G 
AT T T TAG C CT T T T TAT T GGT AAAT AAT AGT TAT T T TAG AC AGT T AAT T G A 
AGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTGT 
TTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAGT 
TTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTTGC 
T AAT AC AAGG AC T T T AGT TAT T AC AAC GG C AAG TTTGGTTGGT GG AC C T C 
TGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAA 
GG AAGC T T T T C AGG T T CT T T C TAT AT T GT C AGT T C AGT T C TAG T C GG CAT 
T GT T AG C GG AAAGAT T GGT GAT AAG C T T AAGGAAAAC CAT CT C T AC C C T T 
CAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAGATG 
CTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCAT 
TCCAAT GAT GAT TTT AAAT AGTTTAGGTTCCACACTTTTCCTTGCGATTT 
T GAAAACT TAT T T GT C AAAT GAAAGT C AG T T AC GC G C AG T T C AAAC GAGA 
GATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTTAGACAAGGTTT 
G AC AC CG C AAT CT G C T AGG AG CG T T T G CG AAAT TAT AAAG AG G C AT ACT A 
ACT T T GAT GCT GT GG G AT T AAC AG AT C GGT C AAAC GT AT TAG CT CAT AT T 
GGTGTTGG C C AT GAT C AC CAT AT T GC AG G AC AAC C G GT C AAAAC AG ACT T 
AT C T AAAAGT GT TAT T T T T GAT G G C G AAC C AAG a AT T G C GC AAGAT AAAG 
CGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGTA 
GTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTACTT 
T G C AGGAG AT AAGAC AAT G T C T GAG G T GG AGG AAAAC C TAG TCCTTGGTT 
T AG CGC AAAT AT TTT C AGG AC AAC T GG C AAT GGGG AT AAC AG AG G AAC AA 
AAT AAGT T AG C C AGT AT GG C AG AG AT AAAGG C T T T AC AAGC AC AAAT C AA 
CCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGTA 
TTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTT 
AG AAC AAG T T T GC AGG GT GGT C AGG AT CGT G AG GT AAC G CT T G AGC AAGA 
AAAAT CAC AT GTGGAT GC T T AT AT GAATGTT GAAAAATT AC GT T TCCCTG 
ATAAATATCAGTTATCTTATGATATTAGTGCACCAGAAAAAATGAAGTTA 
C C AC CT T T T GGT T T AC AG GT ACT GGT AG AG AAT G C AG T T C GAC AT G C T T T 
C AAAG AACGT AAG ACG GAC AAC CAT AT AT T G G T T C AAAT AAAG C C AG AT G 
GTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAGAT 
ACTATCAT TGATAAAT T AGGT C AAGAAACAGTTG CAGAGAGT AAGGGT AC 
AGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGTA 
GTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTGG 
TAT CGAAT ACCT AAT AGAATAAGGG AGGAT GAG CAT GAAAAT T T T AAT T C 
T 

SEQ ID NO. 5806 

STRAIN M732 

T T GAT GGTGTTGT TAT T C C AAAGG C TAG GAAT TAT TAT GAT 

T T T AG C CT T T T T AT T GGT AAAT AAT AGT TAT T T C AG AC AG T T AAT T G AAG 

AGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTGTTT 

GTT AT TAT AT CT AAT AT AAC AGGAATT G AAATAAAAGGGGAT CGAAGT TT 

GGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTTGCTA 

ATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCTCTG 

GTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAAGG 

AAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCATTG 

TTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCTTCA 

AC AAG C C AAGT TAT T T T AAT T AGT AT TAT T G C C G AAAG TAT C C AG AT GCT 

ATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCATTC 

CAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATTTTG 

AAAACT T AT T T G T C AAAT GAAAGT C AG T T AC G C GC AGT T C AAAC GAG AG A 

T GT T C T T GAAT T GAC T C GAC AG AC T C T G C C CT ACC T TAG AC AAGG T T T G A 

CAC C G C AAT C T G C T AGG AG C G T T T G CG AAAT TAT AAAG AG G CAT ACT AAC 

T TT G AT GCT GTGGGATT AAC AG AT CGGTCAAACGT ATT AG CT CAT ATT GG 

TAT T G G C C AT GAT CAC CAT AT T G C AGG AC AAC C GGT C AAAAC AGACT TAT 
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CTAAAAGTGTTATTTTTGATGGCGAACCAAGAATTGCGCAAGATAAAGCG 
GC G At TTCTTGTC C AG AT C AC AACT G T C AGT T AAAT T CT G C T AT T GT AG T 
TCCTCTAAAAATAAATGATAAAACTGTGTGTGCCTTAAAAATGTACTTTG 
CAGGAGATAAGACAATGTCTGAGGTGGAGGAAAACCTAGTCCTTGGTTTA 
G CGCAAAT AT T T T C AGG AC AAC T G G C AAT GG GGAT AAC AG AGG AAC AAAA 
T AAGT T AGC C AGT AT GG C AG AGAT AAAG GC T T T AC AAGC AC AAAT C AAC C 
CTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGTATT 
GAT T C T GAT AAAG C AC GT TAT G C ACT GAT GC AGT T AAGT ACT T T T T T T AG 
AAC AAGT T T GC AAGGT GGT C AG GAT C GT G AG GT AACG C T T GAG C AAG AAA 
AAT CAC AT G T GGAT G CT T AT AT GAAT GT T GAAAAAT T AC GT T T C C C T GAT 
AAAT AT C AGT TAT C T TAT GAT AT T AGT G C AC C AG AAAAAAT G AAGT T AC C 
GCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCTTTCA 
AAGAACGT AAG ACG G AC AAC CAT AT AT T G GTT C AAAT AAAGC CAG AT GGT 
CAT TAT TAT TGTGTTTCTG T T AGT G AC AAT GGAC AAGG AAT C T C AGAT AC 
TAT CATTGAT AAATT AGGT CAAGAAAC AGT TGCAGAGAGTAAGGGGACAG 
GTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGTAGT 
GTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTGGTA 
T CGAAT AC C T AAT AG AAT AAGG G AGG AT GAG CAT G AAAAT T T T AAT T C T 

SEQ ID NO. 5807 

STRAIN COH1 

T T GAT GGTGTTGT TAT T C C AAAGGC T AGG AAT TAT 

TATGATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAA 
TTGAAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGC 
TTGTTTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCG 
AAGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCAC 
TTGCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGA 
CCTCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTT 
TCAAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCG 
G CAT T GT T AG C GG AAAGAT T GGT GAT AAG C T T AAG G AAAAC CAT CT CT AC 
CCTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCA 
GATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTG 
TCATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCG 
ATTTTGAAAACTTATTTGTCAAATGAAAGTCAGTTACGCGCAGTTCAAAC 
G AGAG AT GT T C T T GAAT T G AC T C G AC AG AC T C T G C C CT AC CT TAG AC AAG 
GT T T G AC AC C G C AAT CT G C T AG GAG C GT T T G C G AAAT TAT AAAGAGG CAT 
ACT AAC T T T GAT G CT GT G GG AT T AAC AG AT C GGT C AAAC GT AT TAG CT C A 
TAT T GG T GT T G G C CAT GAT CAC CAT AT TGC AGG AC AAC CG G T C AAAAC AG 
ACTTATCTAAAAGTGTTATTTTTGATGGCGAACCAAGAATTGCGCAAGAT 
AAAGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTAT 
T GT AGT T C CT C T AAAAAT AAAT GAT AAAAC TGTGTGTGCCT T AAAAAT G T 
AC T T T G C AGG AGAT AAG AC AAT GT C T G AGGT G G AGG AAAAC C T AGT C C T T 
G G T T TAG C G C AAAT AT T T T C AGG AC AAC T G G C AAT G GGGAT AAC AGAG G A 
AC AAAAT AAGT TAG C CAG TAT GG C AGAG AT AAAG G C T T T AC AAG CAC AAA 
T C AAC C C T CAT TTCTTCTT T AAT G C C AT T AAC AC AAT T AGT G C AT T AAT C 
CGT ATT GAT TCT GAT AAAGCACGTTATGCACTGATGCAGTT AAGT ACT TT 
TTTTAGAACAAGTTTGCAAGGTGGTCAGGATCGTGAGGTAACGCTTGAGC 
AAGAAAAAT CAC AT GT GGATGCTTAT AT GAATGT TGAAAAAT T ACGT T T C 
C CT GAT AAAT AT C AGT TAT C T T AT GAT AT TAG T G C AC CAG AAAAAAT G AA 
GTTACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATG 
C T T T C AAAG AAC G T AAG AC G G AC AAC CAT AT AT T GG T T C AAAT AAAG CCA 

GATGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTC 
AGAT ACT AT CATT GAT AAAT T AGGT CAAGAAAC AGT T GCAGAGAGT AAGG 
GGACAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATAT 
GGTAGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGT 
TT GGT AT CGAATAC CT AAT AGAATAAGGGAGGAT G AGC AT GAAAAT TTT A 
AT TCT 

SEQ ID NO. 5808 

STRAIN M7 81 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTA 

T GAT T T TAG C C T T T T TAT T G G T AAAT AAT AG T T AT T T CAG AC AGT T AAT T 

GAAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTT 

GTT TGT T AT T AT AT CT AAT AT AACAGGAATT GAAAT AAAAGGGG AT CGAA 
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GTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTT 
GCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACC 
TCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTC 
AAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGC 
AT T G T TAG C GG AAAG AT T GGT G AT AAGCT T AAGGAAAAC CAT C T C TAG C C 
T T C AAC AAG C C AAG T T AT T T T AAT TAG TAT TAT T G C C G AAAG TAT C C AGA 
T GCT AT T T GT T GG C AT T T T T AC AGG AT G GG AACT T GT C AAAAT GAT T GT C 
ATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGAT 
TTTGAAAACTTATTTGTCAAATGAAAGTCAGTTACGCGCAGtTCAAACGA 
GAG AT G T T C T T GAAT T GACT CGAC AGACT C T G C CC T AC C T T AG AC AAGGT 
T T G AC AC C G C AAT CT G C T AGGAG CGT T T GC G AAAT T AT AAAGAG G CAT AC 
T AACT T T GAT G C T GT GGG AT T AAC AG AT C GGT C AAAC GT AT T AG C T CAT A 
TTGGTGTTGGC CAT GAT C AC CAT AT T G C AGG AC AAC C GG T C AAAAC AG AC 
T TAT C T AAAAGT G T TAT T T T T G AT GGCG AAC C AAGAAT T G CG C AAG AT AA 
AGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTG 
TAGTTCCTCTAAAAATAAATGATAAAACTGTGTGTGCCTTAAAAATGTAC 
T T T GC AG G AG AT AAG AC AAT GT CT G AG GT GGAGGAAAAC CT AGT C C T T G G 
T T T AG C G C AAAT AT T T T C AG G AC AACT GG C AAT GGGG AT AAC AG AG GAAC 
AAAAT AAG T TAG C C AGT AT G G C AGAGAT AAAGG CT T T AC AAG C AC AAAT C 
AACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCG 
TATTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTT 
T T AG AAC AAGT T T G C AAGGT GGT C AGG AT CGT GAG GT AAC GC T T GAG C AA 
G AAAAAT C AC AT GT GG AT G C T TAT AT GAAT G T T GAAAAAT TACGTTTCCC 
TGAT AAAT AT CAGT TAT CTT ATGAT ATT AGT GCACCAGAAAAAATGAAGT 
TACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCT 
TTCAAAGAACGTAAGACGGACAACCATATATTGGTTCAAATAAAGCCAGA 
TGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAG 
ATACTATCATTGATAAATTAGGTCAAGAAACAGTTGCAGAGAGTAAGGGG 
AC AGGT AC T G CT CT AG T T AAT CT AAAT AAC AGG C T GAAT T TAT TAT AT GG 
T AGT GT AAG T T G C C T T CAT T T T T CGAGCG AC AAGAAT G GT AC AAAAGT T T 
G GT AT CGAAT AC C T AAT AG AAT AAG G GAG GAT GAG CAT GAAAAT T T T AAT 
TCT 

SEQ ID NO. 5809 

STRAIN CJB110 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTAT 

GAT T T TAG C CT T T T T AT T G G T AAAT AAT AGT TAT T T C AG AC AG T T AAT T G 

AAGAGCGGTCTAAACGTGAAACGGTAGTACTTGTCATCATTTTCGGCTTG 

TTT GTTATT ATAT CT AAT AT AACAGGAATT GAAAT AAAAGGGGAT CGAAG 

TTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTTG 

CTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCT 

CTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCA 

AG G AAG C T T T T C AGG T T C T T T CT AT AT T GT C AGT T CAGT T C T AGT CGGC A 

TTGTTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCT 

T C AAC AAG C C AAGT TAT T T T AAT TAG TAT TAT T G C C G AAAG TAT C C AG AT 

GCTATTTGTTGGTATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCA 

TTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATT 

TT GAAAACT TAT TT GT CAAATGAAAGT CAGTT ACGCGCAGTT C AAACGAG 

AG AT GT T CT T GAAT T G AC T C GAC AG ACT CT G C C C T AC C T C AG AC AAG GT T 

T G AC AC C G C AAT CT G C T AG G AG CG T T T G CG AAAT TAT AAAGAG G CAT AC T 

AACTTTGATGCTGTAGGATTAACAGATCGGTCAAACGTATTAGCTCATAT 

TGGTGTTGGC CAT GAT C AC CAT AT T G C AGG AC AAC C AG T C AAAAC AG AC C 

TAT C T AAAAGT GT TAT T T T T GAT G GC G AAC C AAG AAT T G C G C AAG AT AAA 

GCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGT 

AGT TCCTCT AAAAAT AAATGATAAAACTGTGGGTGCCTT AAAAAT GT ACT 

T T G C AG GAG AT AAG AC AAT GT C T GAG G T G G AGG AAAAC CT AGT C C T T G G T 

T T AG CG C AAAT AT TTT C AGG AC AACT G G C AAT GGG G AT AAC AG AGGAAC A 

AAAT AAGT TAG C CAGT AT GG C AG AG AT AAAG G CT T T AC AAG C AC AAAT C A 

ACCCTCATTTTTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGT 

AT T GAT TCT G AT AAAGC AC GT T AT G C AC T GAT GC AGT T AAG T AC T T T T T T 

TAGAACAAGTTTGCAAGGTGGTCAGGATCGTGAGGTAACGCTTGAGCAAG 

AAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCT 

GAT AAAT AT CAGT TAT CTT AT GAT ATT AGTGCAC CAGAAAAAAT GAAGT T 

ACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTAGACATGCTT 
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T C AAAGAAC GT AAG AC G G AC AAC CAT AT AT T G G T T C AAAT AAAG C C AG AT 
GGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAGA 
T AC TAT CAT T GAT AAAT TAGGT C AAG AAAC AGT T G C AG AGAGT AAG G GT A 
CAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGT 
AGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTG 
GT AT C G AAT ACCT AAT AGAAT AAG GG AGG AT G AGC AT GAAAAT T T T AAT T 
CT 

SEQ ID NO. 5810 

STRAIN 1169NT 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATT 

ATGATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAAT 
TGAAGAGCGGTCTAAACGTGAAACGGTAGTACTTGTCATCATTTTCGGCT 
T GT TT GTT AT TAT AT CT AAT AT AAC AGGAAT T GAAAT AAAAGGGG AT CG A 
AGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACT 
T G CT AAT AC AAGG AC T T T AGT T AT T AC AAC G GC AAG TTTGGTTGGTG G AC 
CTCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTT 
CAAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGG 
CATTGTGAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACC 
CTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAG 
ATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGT 
CATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGA 
T T T T G AAAAC T T AT T T GT C AAAT GAAAGT C AGT T AC GC G C AG T T C AAACG 
AG AGAT GT T C T T GAAT T G AC T C GAC AG AC T C T G C CCT AC CT T AGAC AAGG 
T T T G AC AC C G C AAT C T G C T AGG AGC GT T T G C GAAAT TAT AAAG AG GC AT A 
CTAATTTTGATGCTGTGGGATTAACAGATCGGTCAAACGTATTAGCTCAT 
ATT G GT GT T G GC CAT GAT C AC CAT AT T G C AG GAC AAC C AGT C AAAAC AGA 
C CT AT CT AAAAGT GT T AT T T T T G AT GG C G AAC C AAG AAT T GC G C AAG AT A 
AAGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATT 
GTAGTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTA 
CTTTGCAGGAGATAAGACAATGTCTGAGGTGGAGGAAAACCTAGTCCTTG 
GT T T AGCG C AAAT AT T TT C AGGACAACT G GC AAT GGGGAT AAC AG AGG AA 
CAAAATAAGTTAGCCAGTATGGCAGAGATAAAGGCTTTACAAGCACAAAT 
CAACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCC 
GT AT T GAT T C T GAT AAAG C ACGT T AT G C ACT GAT G C AGT T AAGT AC T T T T 
TTTAGAACAAGTTTGCAAGGTGGTCAGGATCGTGAGGTAACGCTTGAGCA 
AGAAAAAT C AC AT GT GG AT GCT T AT AT GAAT GT T G AAAAAT T AC GT T T CC 
CT GAT AAAT AT C AGTT AT CT T ATGAT AT T AGTGCACCAGAAAAAAT GAAG 
T T AC C G C CT T T T GGT T T AC AG GT ACT GG T AG AG AAT G C AGT T C GAC AT G C 
T T T T AAAG AACGT AAG ACGG AC AAC C AT AT AT T GGT T C AAAT AAAGC C AG 
ATGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCA 
GAT AC TAT CAT T GAT AAAT TAG GT C AAG AAAC AGT T G C AG AG AG T AAG G G 
T AC AGG T AC T G CT CT AGT T AAT C T AAAT AAC AG GCT GAAT T TAT TAT AT G 
GTAGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTT 
TGGTATCGAATACCTAATAGAATAAGGGAGGATGAGCATGAAAATTTTAA 
TTCT 

SEQ ID NO. 5810 

STRAIN JM9130013 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATT 

ATGATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAAT 
TGAAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCT 
T GT T T GT TAT TAT AT C T AAT AT AAC AG GAAT T GAAAT AAAAGG G GAT C G A 
AGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACT 
TGCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGAC 
CTCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTT 
CAAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGG 
CAT T GT T AG C GG AAAG AT T G GT G AT AAG C T T AAG G AAAAC CAT C T CT AC C 
CTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAG 
ATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGT 
C ATT CCAATGATG ATT TT AAAT AGTTTAGGTTCCACACTTTTCCTTGCGA 
TTTTGAAAACTTATTTGTCAAATGAAAGTCAGTTACGCGCAGTTCAAACG 
AGAGATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTTAGACAAGG 
T T T GAC AC C G C AAT CT G C T AG GAG C GT T T G CG AAAT TAT AAAG AGG CAT A 
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CT AAC T T T GAT G C T GT GGG AT T AAC AG AT C GGT C AAAC G TAT TAG C T CAT 
ATTGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAGA 
CT T AT C T AAAAGT GT T AT T T T T GAT G G C G AAC C AAG AAT T G CG C AAG AT A 
AAGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATT 
GTAGTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTA 
CTTTGCAGGAGATAAGACAATGTCTGAGGTGGAGGAAAACCTAGTCCTTG 
GTTTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATAACAGAGGAA 
CAAAATAAGTTAGCCAGTATGGCAGAGATAAAGGCTTTACAAGCACAAAT 
CAACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCC 
GTATTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTT 
T T T AGAAC AAGT T T G C AG G GT GGT C AG GAT C G T G AGG T AACG C T T G AGC A 
a g AAAAAT C AC AT G T GGAT G CT T AT AT G AAT G T T G AAAAAT T AC G T T T C C 
CT G AT AAAT AT C AGT T AT C TT AT GAT AT T AGT G C AC C AG AAAAAAT G AAG 
TTACCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGC 
T T T C AAAG AAC GT AAG AC GG AC AAC CAT AT AT T GGT T C AAAT AAAG C C AG 
ATGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCA 
GAT ACT AT CAT T GAT AAAT T AGGT CAAGAAACAGTT GCAGAGAGT AAGGG 
TACAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATG 
GT AGT GT AAGT T GC C T T CAT T T TT CG AG C G AC AAG AAT GGT AC AAAAGT T 
TGGTATCGAATACCTAATAGAATAAGGGAGGATGAGCATGAAAATTTTAA 
TTCT 

SEQ ID NO. 5811 
STRAIN 2 603 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNHI LVQIKPDGHYYCVS VS DNGQGIS DT 1 1 DKLGQET VAE SKGTGT ALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5812 

STRAIN 0 90 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETVVLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNHI LVQIKPDGHYYCVSVS DNGQGIS DTI I DKLGQET VAE SKGTGT ALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5813 

STRAIN A90 9 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNH I LVQ I KP DGHYYCVS VS DNGQG I S DT 1 1 DKLGQET VAE S KGTGT ALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5814 

STRAIN H36B frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
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DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMI LNS LGSTLFLAI LKT YLSNESQLRAVQTRDVLE LTRQTL P YLRQGLT PQS ARS VCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGS VS CLHFS S DKNGTKVWYR I PNRIRE DEHEN FNS 

SEQ ID NO. 5815 

STRAIN 18RS21 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5816 

STRAIN M7 32 frame: 1 

LMVLLFQRLGI IMILAFLLVNNS YFRQLIEERSKRET WLVI I FGLFVI I SNITGIE IKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMI LNS LGSTLFLAI LKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLT PQS ARS VCE 
IIKRHTNFDAVGLTDRSNVLAHIGIGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVCALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5817 

STRAIN COH1 frame: 1 

LMVLLFQRLGI IMILAFLLVNNSYFRQLIEERSKRETWLVI I FGLFVI I SNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVCALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNH I LVQIKPDGHYYCVSVS DNGQGIS DT 1 1 DKLGQETVAE SKGTGTALVNLNNR 
LN LLYGSVS CLHFS S DKNGTKVW YRI PNRIRE DEHEN FN S 

SEQ ID NO. 5818 

STRAIN M781 frame: 1 

LMVLLFQRLGI IMILAFLLVNNSYFRQLIEERSKRETWLVI I FGLFVI I SNITGIE IKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMI LNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQS ARS VCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIVVPLKINDKTVCALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVS CLHFS S DKNGTKVW YR I PNRIRE DEHEN FNS 

SEQ ID NO. 5819 

STRAIN CJB110 frame: 1 
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LMVLLFQRLGI IMILAFLLVNNSYFRQLIEERSKRETWLVT I FGLFVI I SNITGIE IKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLL YGS VS CLH FS S DKNGTKVW YRI PNRIRE DEHEN FN S 

SEQ ID NO. 5820 

STRAIN 1169NT frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDICAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNHILVQIKPDGHYYCVSVS DNGQGIS DTI I DKLGQETVAE SKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5821 

STRAIN JM9130013 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETVVLVII FGLFVI I SNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVS DNGQGIS DTI I DKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRI PNRIRE DEHENFNS 

SEQ ID NO. 5901 
STRAIN 2603 

AT G AAT AAAAG AAG AAAAT TAT C AAAAT T GAAT GT AAAAAAAC AT CAT T T AG CT T AT G G A 
GCTATCACTTTAGTAGCCCTTTTTTCATGTATTTTGGCTGTAATGGTCATCTTTAAAAGT 
TCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAAAAATCA 
AAAAT GAC T AAG G CG AC AT C T AAAT C AAAAGT AG AAG AT GT AAAAC AGG C T C C AAAAC C T 
T C T C AGGC AT CT AAT G AAGCC C C AAAAT CAAGT T C T C AAT C T AC AG AAG C T AAT T CT C AG 
C AAC AAGT TACT G C GAG T GAAG AGG C AG C T GT AG AAC AAG C AG T T GT AAC AGAAAAC AC C 
CCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCT 
C AAC AC C AG ACG AGT GGC C AAGT AT TGAGT AAT GG AAAT Ac T G C AG G GG C TAT T G G C T C A 
G C AG C T G C AGC AC AAAT GGCTGCTG C AAc AGG AG T C C C T C AG T C T AC T T GGG AAc AT AT T 
AT T G CC CGT GAAT C AAAT GG T AAT C CT AAT GT T G CT AAT G C C T C AGG AG C T T C AG GAC T T 
T T C C AAACG AT G C C AG GT T G GG GT T C AAC AG CT AC AGT T C AG GAT C AAG T T AAT T C AG CT 
ATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTACTAG 

SEQ ID NO. 5902 

STRAIN JM9130013 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAA 

AGC AG AT AAAGT T CG C GT AG C C AAAAAAT C AAAAAT G AAT AAGG C AAC AT 
CT AAAT C AAAAGT AG AAGGTGT AAAAC AG GCTCCAAAACCAAGTTCTCAA 
T C T AC AG AAG C T AAT T CT C AG C AAC AAGT TAG T G C GAG T GAAG AGGC AG C 
T GT AG AAC AAGC AGT T GT AAC AG AAAAT AC C C CT G C T AC C AGT C AAG C AC 
AACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAG 
CCGAGTGGC CAAGT ATT GAG C AAT GG AAAT ACTGCAGGGGTT AT TGGCTC 
AGCAGCAGCAGCACAAATGGCTGCTGCAACGGGAGTTCCTCAGTCTACTT 
GGGAACAT ATT AT TGCCCGT GAAT C AAAT GGTAATCCTAACGTTGCT AAT 
GCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAAC 
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AGC T AC AGT T C AGG AT C AAGT T AAT t C AG C T AT T AAAG CT T AT C GT G C T C 
AAGGT T TAT C AG C T T GG GGT T AC 

SEQ ID NO. 5903 

STRAIN 1169NT reverse complement 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCC 
AAAAAAT C AAAAAT G AC T AAGGCG AC AT C T AAAT C AAAAGT AGAAG AT GT AAAAC AGG C T 
C C AAAAC CT T CT C AGGC AT CT AAT GAAGT C C C AAAAT C AAG T T CT C AAT C T AC AGAAG C T 
AATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAACA 
G AAAAT AC C C C T G C T AC C AG T C AGG C AC AAC AAAC T T AT G CT GT T AC T GAG AC AAC T T AC 
AAAC C T G C T C AAC AC C AG AC AAGT GGC C AAG TAT T GAG C AAT GG AAAT ACT G C AGGGGC G 
GTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGG 
GAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCT 
T C AGG ACT T T T C C AAAC GAT G C C AG GT T GG GGT T C AAC AG CT AC AG T T C AGG AT C AAGT T 
AATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 5904 

STRAIN 18RS21 reverse complement 

AAAAG T T C AC AAGT TACT AC T G AAT C T T T G T C AAAAG C AG AT AAAG T T C 

GCGTAGCCAAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAA 
AAC AGG CT C C AAAAC CT T C T C AGGC AT C T AAT G AAG C C C C AAAAT C AAG T T C T C AAT C T A 
C AG AAG CT AAT T CT C AGC AACAAG T T AC T G C G AGT G AAG AGG C AG C T G TAG AAC AAG C AG 
T T GT AAC AG AAAAC AC C C CT G C T AC C AG T C AG GC AC AAC AAG CT TAT G CT GT T AC TG AG A 
CAACTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTG 
C AG GGGC T AT T G G C T C AGC AG CT G C AG C AC AAAT G G C T G C T G C AAC AG GAGT C C C T C AGT 

CTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCT 
CAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGG 
AT C AAGT T AAT T C AG CT AT T AAAG C T TAT C GT G C T C AAGGT T TAT C AG CTTGGGGT T AC 

SEQ ID NO. 5905 

STRAIN 090 reverse complement 

TAG C C AAAAAAT C AAAAAT GAT T AAGG C G AC AT C T AAAT C AAAAG T AGAAG AT GT AAAAC 
AGG CT C C AAAAC CT T C T C AGG CAT C T AAT G AAG C C CC AAAAT C AAGT T CT C AAT CT AC AG 
AAG CT AAT T C T C AG C AAC AAGT TACT G CGAGT G AAG AG G C AG CT G TAG AAC AAG C AG T T G 
T AAC AG AAAAC AC C C C T GC T AC C AGT C AG G C AC AAC AAG CT TAT G C T GT T AC T G AG AC AA 
CTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTGCAG 
GGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTA 
CTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAG 
GAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGA 

SEQ ID NO. 5906 

STRAIN A90 9 reverse complement 

AAGG CG AC AT CT AAAT CAAAAGT AG AAGATGT AAAAC AGGCTCC AAAAC CTTCTC AGGC A 
T C T AAT G AAG C C C C AAAAT C AAGT T CT C AAT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T 
ACT GC G AGT G AAG AG G C AGC T G TAG AAC AAG C AGT T G T AAC AGAAAAC AC C C C T G C T AC C 
AGT C AG G C AC AACAAG C T TAT G C T G T T AC T GAG AC AAC T TAT AG AC CT G C T C AAC AC C AG 
ACAAGTGGCCAAGTATTGAGTAATGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCA 
GCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGT 
GAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACG 
AT G C C AG GTTGGGGTT C AAC AG C T AC AGT T C AG AAT C AAGT T AAT T C AG C TAT T AAAG CT 
TATCGTGCTCAAGGTTTATCA 

SEQ ID NO. 5907 

STRAIN CJB110 reverse complement 

AAT C T T T GT C AAAAG C AG AT AAAG T T C G CG T AG C C AAAAAAT C AAAAAT G AC T AAG G C G A 
CAT C T AAAT CAAAAGT AG AAG AT G T AAAAC AG G C T C C AAAAC C T T C T C AG G CAT C T AAT G 
AAG C C C C AAAAT C AAG T T C T C AAT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T AC T G C GA 
GT G AAG AGG C AG C T GT AG AAC AAG C AGT T GT AAC AG AAAAC AC C C C T G C T AC C AG T C AGG 
C AC AAC AAG C T T AT GC T G T TAG T GAG AC AAC T TAT AG AC CT G C T C AAC AC C AG AC GAG T G 
G CC AAGT AT T GAGT AAT G G AAAT ACT G C AGG GGC TAT T G G C T C AG C AG C T G C AG C AC AAA 
TGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAA 
ATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAG 

GTTGGGGTTCAACAGCTACAGTTCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTG 
CTCAAGGTTT AT C AGCT TGGGGT TAC 
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SEQ ID NO. 5908 

STRAIN COH1 reverse complement 

AAAAG T T C AC AAGT T AC TACT G AAT C T T T GT C AAAAGC AG AT AA 

AGT T CG CG T AGC C AAAAAAT C AAAAAT G AC T AAG G C G AC AT C T AAAT C AAAAGT AGAAG A 
TGTAAAACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCA 
AT C T AC AGAAG C T AAT T CT C AG C AAC AAG T T AC T GC G AGT G AAG AGG C GG CT GT AG AAC A 
AG C AGT T GT AAC AGAAAAT AC C C C T G CT AC C AGT C AG G C AC AAC AAAC T T AT GCT GT T AC 
T G AG AC AACT T AC AAAC C T G C T C AAC AC C AG AC AAGT GG C C AAGT AT T GAG C AAT G G AAA 
TACTGCAGGGGCGGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCC 
TCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAA 
T G C CT C AGG AGC T T C AGG AC T T T T C C AAAC GAT G C C AGG T T GGG GT T C AAC AG C T AC AG T 
T C AGGAT C AAG T T AAT T C AG C T AT T AAAG C T TAT CG T GC T C AAGGT T TAT C AG C T T G GGG 
TTAC 

SEQ ID NO. 5909 

STRAIN H3 6B reverse complement 
AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGC 

AG AT AAAGT T CG CGT AG C C AAAAAAT C AAAAAT G AC T AAGG C G AC AT CT AAAT C AAAAGT 
AG AAG AT GT AAAAC AGG CT C C AAAAC CT T C T C AG GC AT CT AAT GAAGC C C C AAAAT C AAG 
T T CT CAAT CT AC AG AAG C T AAT T C T C AG C AAC AAG T T AC T G C GAGT GAAGAGG C AG C T GT 
AG AAC AAG C AGT T GT AAC AGAAAAC AC C C CT G CT AC C AGT CAG G C AC AAC AAG C T TAT G C 
T GT T AC T GAG AC AAC T TAT AG AC C T G C T C AAC AC C AG AC AAGT GGC C AAG TAT T G AGT AA 
T GG AAAT ACT G C AGGG G C T AT T G G C T C AGC AGC T G C AG C AC AAAT GGCTGCTG C AAC AGG 
AGT C C CT CAG T C T AC T T GGG AAC AT AT TAT T G C C C GT G AAT C AAAT GG T AAT C CT AAT GT 

TGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGC 
T AC AG T T CAG GAT C AAG T T AAT T CAG C T AT T AAAG C T T 

SEQ ID NO. 5910 

STRAIN M732 reverse complement 

AAAAGTT CAC AAGT TACT ACTGAAT CTTT GTCAAAAGCAGAT AAAGT T CGCGTAGC 
C AAAAAAT C AAAAAT GACTAAGGCGACAT CT AAAT C AAAAGT AG AAGAT GT AAAAC AGG C 
T C C AAAACC T T C T C AGG CAT C T AAT G AAG C C C C AAAAT C AAGT T C T CAAT C T AC AG AAG C 
T AAT T C T C AGC AAC AAGT T AC T G C GAGT G AAG AG G C GG C T GT AG AAC AAG C AGT T GT AAC 
AG AAAAT AC C C C T GC T AC C AGT CAG G CAC AAC AAAC T T ATG CT GT TACT G AG AC AACT T A 
C AAAC C T G CT C AAC AC CAG AC AAG TG G C C AAGT AT T G AGC AAT G GAAAT AC T G C AGG GGC 

GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTG 
G G AAC AT AT TAT T GC C C GT G AAT C AAAT G GT AAT C C T AAT GT T G C T AAT G C C T C AGG AG C 
T T CAG G ACT T T T C C AAAC GAT G C C AG GT T G G GGT T C AAC AG C T AC AGT T CAG GAT C AAGT 
TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

SEQ ID NO. 5911 

STRAIN M781 reverse complement 

TCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAAAAATCAAAAATGACTAAGGCGACA 
T CT AAAT C AAAAGT AGAAGAT GT AAAAC AGG C T C C AAAAC C T TC T C AG GC AT C T AAT GAA 
G C C C C AAAAT C AAGT T C T CAAT C T AC AG AAG CT AAT T CT CAG C AAC AAG T T ACT G CG AG T 
G AAG AGG C GG CT G T AG AAC AAGC AGT T GT AAC AG AAAAT AC C C CT G CT AC C AGT CAG G C A 
CAACAAACTTATGCTGTTACTGAGACAACTTACAAACCTGCTCAACACCAGACAAGTGGC 
C AAGT AT T GAG CAAT GG AAAT AC T G C AGGG G C GGT CG GAT CTGCTGCTG CAG CAC AAAT G 

GCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAAT 
GGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGT 
T GG GGT T C AAC AG C T AC AGT T CAG GAT C AAG T T AAT T C AGC TAT T AAAG C T TAT C GT G C T 
CAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 5912 
STRAIN 2603 frame: 1 

MNKRRKL S KLN VKKHHL AYG AI TL VAL FS C I L AVMVI FKS S QVT TE S L S KADKVRVAKKS 
KMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVVTENT 
PATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHI 
IARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRAQGLSAWGY 

SEQ ID NO. 5913 

STRAIN 1169NT frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEVPKSSSQSTEAN 
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SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5914 

STRAIN 18RS21 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5915 

STRAIN 2603 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 

GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5916 

STRAIN 090 frame: 3 

AKKSKMIKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAW 
TENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQST 
WEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQ 

• SEQ ID NO. 5917 

STRAIN A909 frame: 1 

KATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAWTENTPAT 
SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHIIAR 
ESNGNPNVANASGASGLFQTMPGWGSTATVQNQVNSAIKAYRAQGLS 

SEQ ID NO. 5918 

STRAIN CJB110 frame: 3 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

SEQ ID NO. 5919 

STRAIN COH1 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5920 

STRAIN H3 6B frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKA 

SEQ ID NO. 5921 

STRAIN M732 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWG 

SEQ ID NO. 5922 

STRAIN M7 81 frame: 4 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAVGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 
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SEQ ID NO. 5923 

STRAIN JM9130013 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMNKATSKSKVEGVKQAPKPSSQSTEANSQQQVTASEE 

AAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQPSGQVLSNGNTAGVIGSAAAAQMAA 

ATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRAQG 
LSAWGY 

SEQ ID NO. 6001 
STRAIN 2603 

ATGAAAGAAAAACAGTCGAAAAGGCTTATTTATATACTACTGGTTGTTTCCATTATTTTT 
AT AAGT GT T T T T ACAT AC AGT AT T AGC C AGC C T T CT AAAC TAG T T C C AC C AAAAGAAT T A 
GT T AT T CT AAG T C C AAAT AGT C AAG C CAT T T T AAC AGG AAC G AT T C C AG C T T T T GAG G AA 
AAAT ACG GT AT AAAAGT T AAG C T T AT T C AAGG T GGG AC AG GG C AAC T AAT AG AT AGAT T A 
AGT AAG G AGGGT AAG C AG T T G AAG G C G GAT AT T T T C T T TGG AGG AAAT T AT ACG C AAT T T 

GAAAGTCATAAGGCATTGTTTGAGTCTTACGTATCAAAGAATGTTCATACTGTTATTCCA 
G ACT AT AT C CAT C C AAGT GAT AC GG C G AC AC C T TAT AC TAT AAAT GGG AGT GT C T T GAT T 
GT AAAT AACG AAT TAG C T AAGG G AC T T AC CAT CAAG AG T TAT GAAG ATT TAT T ACAG C CT 

TCCTTAAAAGGTAAAATTGCCTTTGCAGATCCGAATACTTCCTCTAGTGCTTTCTCACAA 
C T C ACT AAT AT ACT C T T G G C C AAGG GT G G T T AC AC C AAT C C AAAAGC GT G GAAC TAT GT T 
AAAAAGCT AC AAC AT AAT AT T AAT G C TAT C AAAT C T T C TAG C T CT T C AG AAGT T TAT C AA 

TCAGTTGCAGAAGGAAAAATGATTGTGGGGCTGACTTACGAAGACCCTAGTGTCAATTTG 
C AAAAAAGT GGT GC C AAT GT T T C TAT T G TAT AT C CG AC AG AAG G GAC AGT TTTTGTCC C A 

TCTTCGGTTGCAATTATAAAGAATGCTCCTTCTATGAAAGAAGCAAAGTTATTTATTAAT 

TTTATGCTTTCTTTAGATGTTCAAAATGCCTTTGGGCAGTCAACGAGTAACCGACCTATT 

CGTAAAGATGCCCAAACGAGTAATGGCATGAAAGCTTTAAAGGATATTGCTACTCTTAAA 

GAAGATTATCGCTATGTCACTAAGCATAAGGGCCAAATCCTTAAAACCTATAATCGTATT 
C G TAG AAAT G CT GAT 

SEQ ID NO. 6002 

STRAIN 090 

C AG CC T T CT AAAC T ACTT C C AC C AAAAGAAT T AGT TAT T CT AAGT 

C CAAAT AGT CAAG C CAT T T T AAC AG GAAC GAT T C C AG C T T TT GAGGAAAA 

AT AC G GT AT AAAAGT T AAG C T TAT T C AAG GT G GGAC AGGG C AAC T AAT AG 

ATAGATTAAGTAAGGAGGGTAAGCAGTTGAAGGCGGATATTTTCTTTGGA 
G G AAAT TAT ACG C AAT T T G AAAGT CAT AAG G CAT T GT T T G AGT C T T AC G T 
AT C AAAGAAT GTT CAT AC T GT TAT T C C AG AC TAT AT C CAT C C AAGT GAT A 
C G G CGAC AC C T TAT AC TAT AAAT G G GAGT GT C T T GAT T G T AAAT AAC G AA - 
T T AGC T AAG G GACT T AC CAT CAAG AGT TAT G AAGAT T TAT T AC AG C C T T C 
CTTAAAAGGTAAAATTGCCTTTGC AGAT CCG AAT ACT TCCTCTAGTGCTT 
T C T C AC AAC T C ACT AAT AT AC T C T T G G C CAAG GGT GGT T AC AC C AAT C C A 
AAAG C GT G G AACT ATG T T AAAAAG CT AC AAC AT AAT AT T AAT G CT AT C AA 
AT CT T C T AGC T C T T C AG AAGT T TAT C AAT C AGT T G C AG AAG G AAAAAT GA 

TTGTGGGGCTGACTTACGAAGACCCTAGTGTCAATTTGCAAAAAAGTGGT 
GC C AAT GT T T C T AT T GT AT AT C CGAC AG AAGGG AC AGT T T T T GT C C C AT C 
T T C GGT T G C AAT TAT AAAG AAT G C T CC T T C T AT G AAAG AAG C AAAG T TAT 
TTATTAATTTTATGCTTtCTTTAgATGTTCAAAATGCCTTTGGGCAGTCA 
ACGAGT AACCGAC CT ATT CGT AAAGATGCCCAAACGAGT AATGG CAT GAA 
AG CT T T AAAG GAT AT T G C T AC T C T T AAAG AAG AT TAT CG C TAT G T C AC T A 
AGC AT AAG GG C CAAAT C CT T AAAAC C T AT AAT C G TAT T C GT AG AAAT G CT 
GAT 

SEQ ID NO. 6003 

STRAIN A90 9 

C AGC C T T C T AAAC T AC T T C C AC C AAAAGAAT TAG 

T T AT T C T AAGT C CAAAT AGT C AAG C CAT T T T AAC AG GAAC GAT T C C AG CT 
TTTGAGGAAAAATACGGTATAAAAGTTAAGCTTATTCAAGGTGGGACAGG 
T C AAC T AAT AG AT AGAT T AAGT AAG G AGGGT AAG C AGT T G AAGG C G GAT A 

T TTTCTT TGG AGGAAATT AT ACGC AAT TTG AAAGT CAT AAGG CATTGTTT 
GAGT C T T AC G TAT C AAAG AAT AT T CAT AC T G T TAT T C C AG AT TAT AT C C A 

TCCGAGTGATACGGCGACACCTTATACTATAAATGGGAGTGTCTTGATTG 
T AAAT AAC G AAT TAG CT AAG GGAC T T AC CAT C AAGAGT T AT GAAG AT T T A 

TTACAGCCTTCCTTAAAAGGTAAAATTGCCTTTGCAGATCCGAATACTTC 
CTCTAGTGCTTTCTCACAACTCACTAATATACTCTTGGCCAAGGGTGGTT 
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